When can transformers compositionally generalize in-context?
Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of which might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinati...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
16-07-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Many tasks can be composed from a few independent components. This gives rise
to a combinatorial explosion of possible tasks, only some of which might be
encountered during training. Under what circumstances can transformers
compositionally generalize from a subset of tasks to all possible combinations
of tasks that share similar components? Here we study a modular multitask
setting that allows us to precisely control compositional structure in the data
generation process. We present evidence that transformers learning in-context
struggle to generalize compositionally on this task despite being in principle
expressive enough to do so. Compositional generalization becomes possible only
when introducing a bottleneck that enforces an explicit separation between task
inference and task execution. |
---|---|
DOI: | 10.48550/arxiv.2407.12275 |