When can transformers compositionally generalize in-context?

Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of which might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinati...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kobayashi, Seijin, Schug, Simon, Akram, Yassir, Redhardt, Florian, von Oswald, Johannes, Pascanu, Razvan, Lajoie, Guillaume, Sacramento, João
Format:	Journal Article
Language:	English
Published:	16-07-2024
Subjects:	Computer Science - Learning Computer Science - Neural and Evolutionary Computing
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of which might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us to precisely control compositional structure in the data generation process. We present evidence that transformers learning in-context struggle to generalize compositionally on this task despite being in principle expressive enough to do so. Compositional generalization becomes possible only when introducing a bottleneck that enforces an explicit separation between task inference and task execution.
DOI:	10.48550/arxiv.2407.12275