A Flexible State Space Model for Large Language Models: The GroupMamba Approach

Transformers have consistently excelled in large language models owing to their exceptional scalability, efficient parallel processing, superior contextual comprehension, and versatility across a wide range of tasks. In recent years, state space models (SSMs) have also seen notable advancements, wit...

Full description

Saved in:
Bibliographic Details
Published in:Sensors and materials Vol. 36; no. 10; p. 4283
Main Authors: Liu, Xiling, Ruan, Qunsheng, Wu, Yingjia, Chen, Kai, Yang, Cheng-Fu
Format: Journal Article
Language:English
Published: Tokyo MYU Scientific Publishing Division 01-01-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Transformers have consistently excelled in large language models owing to their exceptional scalability, efficient parallel processing, superior contextual comprehension, and versatility across a wide range of tasks. In recent years, state space models (SSMs) have also seen notable advancements, with the Mamba model standing out for its efficient parallel processing capabilities and low computational complexity. However, despite these strengths, SSMs, including Mamba, often struggle to match the performance of transformers in tasks that require deep contextual understanding and the handling of high-dimensional data. In this paper, we introduce GroupMamba, a novel group-based SSM specifically designed to optimize the trade-off between complexity and parallel processing capabilities by strategically grouping SSM modules. These groupings can be customized to suit various tasks, effectively blending the strengths of both Mamba and transformer architectures. Experimental results demonstrate that GroupMamba achieves significant improvements across diverse tasks, including a notable 2% increase in accuracy on public benchmark tests. In this work, we mark a significant advancement in the integration of SSMs and transformers, offering a more adaptable, scalable, and efficient solution for addressing complex natural language processing challenges.
ISSN:0914-4935
2435-0869
DOI:10.18494/SAM5309