publications | Aryo Lotfi

2023

Generalization on the Unseen, Logic Reasoning and Degree Curriculum

Emmanuel Abbe , Samy Bengio , Aryo Lotfi, and Kevin Rizk

In Proceedings of the 40th International Conference on Machine Learning , 23–29 jul 2023

Abs Bib PDF Code

This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette of an ’extrapolating’ or ’reasoning’ learner. We then study how different network architectures trained by (S)GD perform under GOTU and provide both theoretical and experimental evidence that for a class of network models including instances of Transformers, random features models, and diagonal linear networks, a min-degree-interpolator is learned on the unseen. We also provide evidence that other instances with larger learning rates or mean-field networks reach leaky min-degree solutions. These findings lead to two implications: (1) we provide an explanation to the length generalization problem (e.g., Anil et al. 2022); (2) we introduce a curriculum learning algorithm called Degree-Curriculum that learns monomials more efficiently by incrementing supports.
@inproceedings{pmlr-v202-abbe23a, title = {Generalization on the Unseen, Logic Reasoning and Degree Curriculum}, author = {Abbe, Emmanuel and Bengio, Samy and Lotfi, Aryo and Rizk, Kevin}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {31--60}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, url = {https://proceedings.mlr.press/v202/abbe23a.html}, }
Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs

Emmanuel Abbe , Elisabetta Cornacchia , and Aryo Lotfi

In Advances in Neural Information Processing Systems , 23–29 jul 2023

Abs Bib PDF Code

Experimental results have shown that curriculum learning, i.e., presenting simpler examples before more complex ones, can improve the efficiency of learning. Some recent theoretical results also showed that changing the sampling distribution can help neural networks learn parities, with formal results only for large learning rates and one-step arguments. Here we show a separation result in the number of training steps with standard (bounded) learning rates on a common sample distribution: if the data distribution is a mixture of sparse and dense inputs, there exists a regime in which a 2-layer ReLU neural network trained by a curriculum noisy-GD (or SGD) algorithm that uses sparse examples first, can learn parities of sufficiently large degree, while any fully connected neural network of possibly larger width or depth trained by noisy-GD on the unordered samples cannot learn without additional steps. We also provide experimental results supporting the qualitative separation beyond the specific regime of the theoretical results.
@inproceedings{neurips-parity-curriculum, author = {Abbe, Emmanuel and Cornacchia, Elisabetta and Lotfi, Aryo}, booktitle = {Advances in Neural Information Processing Systems}, editor = {Oh, A. and Neumann, T. and Globerson, A. and Saenko, K. and Hardt, M. and Levine, S.}, pages = {24291--24321}, publisher = {Curran Associates, Inc.}, title = {Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs}, volume = {36}, year = {2023}, }

2022

Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures

Emmanuel Abbe , Samy Bengio , Elisabetta Cornacchia , Jon Kleinberg , Aryo Lotfi, Maithra Raghu , and Chiyuan Zhang

In Advances in Neural Information Processing Systems , 23–29 jul 2022

Abs Bib PDF Code

This paper considers the Pointer Value Retrieval (PVR) benchmark introduced in [ZRKB21], where a ’reasoning’ function acts on a string of digits to produce the label. More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks. It is first shown that in order to learn logical functions with gradient descent on symmetric neural networks, the generalization error can be lower-bounded in terms of the noise-stability of the target function, supporting a conjecture made in [ZRKB21]. It is then shown that in the distribution shift setting, when the data withholding corresponds to freezing a single feature (referred to as canonical holdout), the generalization error of gradient descent admits a tight characterization in terms of the Boolean influence for several relevant architectures. This is shown on linear models and supported experimentally on other models such as MLPs and Transformers. In particular, this puts forward the hypothesis that for such architectures and for learning logical functions such as PVR functions, GD tends to have an implicit bias towards low-degree representations, which in turn gives the Boolean influence for the generalization error under quadratic loss.
@inproceedings{neurips-boolean-pvr, author = {Abbe, Emmanuel and Bengio, Samy and Cornacchia, Elisabetta and Kleinberg, Jon and Lotfi, Aryo and Raghu, Maithra and Zhang, Chiyuan}, booktitle = {Advances in Neural Information Processing Systems}, editor = {Koyejo, S. and Mohamed, S. and Agarwal, A. and Belgrave, D. and Cho, K. and Oh, A.}, pages = {2709--2722}, publisher = {Curran Associates, Inc.}, title = {Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures}, volume = {35}, year = {2022}, }

2021

Semi-Supervised Disentanglement of Class-Related and Class-Independent Factors in VAE

Sina Hajimiri , Aryo Lotfi, and Mahdieh Soleymani Baghshah

23–29 jul 2021

Abs Bib PDF Code

In recent years, extending variational autoencoder’s framework to learn disentangled representations has received much attention. We address this problem by proposing a framework capable of disentangling class-related and class-independent factors of variation in data. Our framework employs an attention mechanism in its latent space in order to improve the process of extracting class-related factors from data. We also deal with the multimodality of data distribution by utilizing mixture models as learnable prior distributions, as well as incorporating the Bhattacharyya coefficient in the objective function to prevent highly overlapping mixtures. Our model’s encoder is further trained in a semi-supervised manner, with a small fraction of labeled data, to improve representations’ interpretability. Experiments show that our framework disentangles class-related and class-independent factors of variation and learns interpretable features. Moreover, we demonstrate our model’s performance with quantitative and qualitative results on various datasets.
@misc{hajimiri2021semisupervised, title = {Semi-Supervised Disentanglement of Class-Related and Class-Independent Factors in VAE}, author = {Hajimiri, Sina and Lotfi, Aryo and Baghshah, Mahdieh Soleymani}, year = {2021}, eprint = {2102.00892}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, }