{"day":"14","external_id":{"arxiv":["2210.08031"]},"status":"public","year":"2022","oa":1,"date_published":"2022-10-14T00:00:00Z","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","article_processing_charge":"No","department":[{"_id":"FrLo"}],"date_created":"2023-08-22T13:57:27Z","month":"10","citation":{"chicago":"Rahaman, Nasim, Martin Weiss, Francesco Locatello, Chris Pal, Yoshua Bengio, Bernhard Schölkopf, Li Erran Li, and Nicolas Ballas. “Neural Attentive Circuits.” In 36th Conference on Neural Information Processing Systems, Vol. 35, 2022.","ista":"Rahaman N, Weiss M, Locatello F, Pal C, Bengio Y, Schölkopf B, Li LE, Ballas N. 2022. Neural attentive circuits. 36th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in Neural Information Processing Systems, vol. 35.","apa":"Rahaman, N., Weiss, M., Locatello, F., Pal, C., Bengio, Y., Schölkopf, B., … Ballas, N. (2022). Neural attentive circuits. In 36th Conference on Neural Information Processing Systems (Vol. 35). New Orleans, United States.","ama":"Rahaman N, Weiss M, Locatello F, et al. Neural attentive circuits. In: 36th Conference on Neural Information Processing Systems. Vol 35. ; 2022.","mla":"Rahaman, Nasim, et al. “Neural Attentive Circuits.” 36th Conference on Neural Information Processing Systems, vol. 35, 2022.","short":"N. Rahaman, M. Weiss, F. Locatello, C. Pal, Y. Bengio, B. Schölkopf, L.E. Li, N. Ballas, in:, 36th Conference on Neural Information Processing Systems, 2022.","ieee":"N. Rahaman et al., “Neural attentive circuits,” in 36th Conference on Neural Information Processing Systems, New Orleans, United States, 2022, vol. 35."},"alternative_title":[" Advances in Neural Information Processing Systems"],"title":"Neural attentive circuits","conference":{"name":"NeurIPS: Neural Information Processing Systems","location":"New Orleans, United States","end_date":"2022-12-01","start_date":"2022-11-29"},"author":[{"last_name":"Rahaman","full_name":"Rahaman, Nasim","first_name":"Nasim"},{"last_name":"Weiss","first_name":"Martin","full_name":"Weiss, Martin"},{"orcid":"0000-0002-4850-0683","full_name":"Locatello, Francesco","first_name":"Francesco","id":"26cfd52f-2483-11ee-8040-88983bcc06d4","last_name":"Locatello"},{"last_name":"Pal","full_name":"Pal, Chris","first_name":"Chris"},{"full_name":"Bengio, Yoshua","first_name":"Yoshua","last_name":"Bengio"},{"last_name":"Schölkopf","full_name":"Schölkopf, Bernhard","first_name":"Bernhard"},{"first_name":"Li Erran","full_name":"Li, Li Erran","last_name":"Li"},{"last_name":"Ballas","full_name":"Ballas, Nicolas","first_name":"Nicolas"}],"main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2210.08031"}],"_id":"14168","oa_version":"Preprint","abstract":[{"lang":"eng","text":"Recent work has seen the development of general purpose neural architectures\r\nthat can be trained to perform tasks across diverse data modalities. General\r\npurpose models typically make few assumptions about the underlying\r\ndata-structure and are known to perform well in the large-data regime. At the\r\nsame time, there has been growing interest in modular neural architectures that\r\nrepresent the data using sparsely interacting modules. These models can be more\r\nrobust out-of-distribution, computationally efficient, and capable of\r\nsample-efficient adaptation to new data. However, they tend to make\r\ndomain-specific assumptions about the data, and present challenges in how\r\nmodule behavior (i.e., parameterization) and connectivity (i.e., their layout)\r\ncan be jointly learned. In this work, we introduce a general purpose, yet\r\nmodular neural architecture called Neural Attentive Circuits (NACs) that\r\njointly learns the parameterization and a sparse connectivity of neural modules\r\nwithout using domain knowledge. NACs are best understood as the combination of\r\ntwo systems that are jointly trained end-to-end: one that determines the module\r\nconfiguration and the other that executes it on an input. We demonstrate\r\nqualitatively that NACs learn diverse and meaningful module configurations on\r\nthe NLVR2 dataset without additional supervision. Quantitatively, we show that\r\nby incorporating modularity in this way, NACs improve upon a strong non-modular\r\nbaseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about\r\n10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that\r\nNACs can achieve an 8x speedup at inference time while losing less than 3%\r\nperformance. Finally, we find NACs to yield competitive results on diverse data\r\nmodalities spanning point-cloud classification, symbolic processing and\r\ntext-classification from ASCII bytes, thereby confirming its general purpose\r\nnature."}],"publication":"36th Conference on Neural Information Processing Systems","publication_status":"published","type":"conference","extern":"1","volume":35,"date_updated":"2023-09-11T09:29:09Z","language":[{"iso":"eng"}],"intvolume":" 35"}