A Flexible State Space Model for Large Language Models: The GroupMamba Approach
Transformers have consistently excelled in large language models owing to their exceptional scalability, efficient parallel processing, superior contextual comprehension, and versatility across a wide range of tasks. In recent years, state space models (SSMs) have also seen notable advancements, wit...
Saved in:
Published in: | Sensors and materials Vol. 36; no. 10; p. 4283 |
---|---|
Main Authors: | , , , , |
Format: | Journal Article |
Language: | English |
Published: |
Tokyo
MYU Scientific Publishing Division
01-01-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Transformers have consistently excelled in large language models owing to their exceptional scalability, efficient parallel processing, superior contextual comprehension, and versatility across a wide range of tasks. In recent years, state space models (SSMs) have also seen notable advancements, with the Mamba model standing out for its efficient parallel processing capabilities and low computational complexity. However, despite these strengths, SSMs, including Mamba, often struggle to match the performance of transformers in tasks that require deep contextual understanding and the handling of high-dimensional data. In this paper, we introduce GroupMamba, a novel group-based SSM specifically designed to optimize the trade-off between complexity and parallel processing capabilities by strategically grouping SSM modules. These groupings can be customized to suit various tasks, effectively blending the strengths of both Mamba and transformer architectures. Experimental results demonstrate that GroupMamba achieves significant improvements across diverse tasks, including a notable 2% increase in accuracy on public benchmark tests. In this work, we mark a significant advancement in the integration of SSMs and transformers, offering a more adaptable, scalable, and efficient solution for addressing complex natural language processing challenges. |
---|---|
ISSN: | 0914-4935 2435-0869 |
DOI: | 10.18494/SAM5309 |