Analysis of Transformer's Attention Behavior in Sleep Stage Classification and Limiting It to Improve Performance
The transformer architecture has been focused on many tasks like natural language processes, vision tasks and etc. The most important and general requirement of using the transformer-based architecture is that the model must be trained on a large-scale dataset before it can be fine-tuned for a speci...
Saved in:
Published in: | IEEE access Vol. 12; pp. 95914 - 95925 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
Piscataway
IEEE
2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The transformer architecture has been focused on many tasks like natural language processes, vision tasks and etc. The most important and general requirement of using the transformer-based architecture is that the model must be trained on a large-scale dataset before it can be fine-tuned for a specific task like classification, object detection and etc. However, in this paper, we find that the transformer architecture has better generalization capability to capture the features from data samples for sleep stage classification than CNN-based architectures, despite using a small-scale dataset without pretraining on large-scale dataset. This outcome contradicts the widely-held belief that a transformer architecture is more effective when trained on large datasets. In this paper, we investigate the attention behavior of a transformer model and demonstrate how global and local attentions influence an attention map in a transformer architecture. Finally, through experiments, we show that restricting global attention using 'Masked Multi-Head Self-Attention (M-MHSA)' can lead to improved model generalization in sleep stage classification compared with the previous methodologies and original transformer-based architecture on three different datasets. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3424236 |