Optimizing Vision Transformers for Medical Image Segmentation
For medical image semantic segmentation (MISS), Vision Transformers have emerged as strong alternatives to convolutional neural networks thanks to their inherent ability to capture long-range correlations. However, existing research uses off-the-shelf vision Transformer blocks based on linear projec...
Saved in:
Published in: | ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1 - 5 |
---|---|
Main Authors: | , , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
04-06-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | For medical image semantic segmentation (MISS), Vision Transformers have emerged as strong alternatives to convolutional neural networks thanks to their inherent ability to capture long-range correlations. However, existing research uses off-the-shelf vision Transformer blocks based on linear projections and feature processing which lack spatial and local context to refine organ boundaries. Furthermore, Transformers do not generalize well on small medical imaging datasets and rely on large-scale pre-training due to limited inductive biases. To address these problems, we demonstrate the design of a compact and accurate Transformer network for MISS, CS-Unet, which introduces convolutions in a multi-stage design for hierarchically enhancing spatial and local modeling ability of Transformers. This is mainly achieved by our well-designed Convolutional Swin Transformer (CST) block which merges convolutions with Multi-Head Self-Attention and Feed-Forward Networks for providing inherent localized spatial context and inductive biases. Experiments demonstrate CS-Unet without pre-training out- performs other counterparts by large margins on multi-organ and cardiac datasets with fewer parameters and achieves state-of-the-art performance. Our code is available at Github 1 . |
---|---|
AbstractList | For medical image semantic segmentation (MISS), Vision Transformers have emerged as strong alternatives to convolutional neural networks thanks to their inherent ability to capture long-range correlations. However, existing research uses off-the-shelf vision Transformer blocks based on linear projections and feature processing which lack spatial and local context to refine organ boundaries. Furthermore, Transformers do not generalize well on small medical imaging datasets and rely on large-scale pre-training due to limited inductive biases. To address these problems, we demonstrate the design of a compact and accurate Transformer network for MISS, CS-Unet, which introduces convolutions in a multi-stage design for hierarchically enhancing spatial and local modeling ability of Transformers. This is mainly achieved by our well-designed Convolutional Swin Transformer (CST) block which merges convolutions with Multi-Head Self-Attention and Feed-Forward Networks for providing inherent localized spatial context and inductive biases. Experiments demonstrate CS-Unet without pre-training out- performs other counterparts by large margins on multi-organ and cardiac datasets with fewer parameters and achieves state-of-the-art performance. Our code is available at Github 1 . |
Author | Deligianni, Fani Liu, Qianying Kaul, Chaitanya Wang, Jun Murray-Smith, Roderick Anagnostopoulos, Christos |
Author_xml | – sequence: 1 givenname: Qianying surname: Liu fullname: Liu, Qianying organization: University of Glasgow,School of Computing Science – sequence: 2 givenname: Chaitanya surname: Kaul fullname: Kaul, Chaitanya organization: University of Glasgow,School of Computing Science – sequence: 3 givenname: Jun surname: Wang fullname: Wang, Jun organization: University of Warwick – sequence: 4 givenname: Christos surname: Anagnostopoulos fullname: Anagnostopoulos, Christos organization: University of Glasgow,School of Computing Science – sequence: 5 givenname: Roderick surname: Murray-Smith fullname: Murray-Smith, Roderick organization: University of Glasgow,School of Computing Science – sequence: 6 givenname: Fani surname: Deligianni fullname: Deligianni, Fani organization: University of Glasgow,School of Computing Science |
BookMark | eNo1j8tKw0AUhkdRsKm-gYvxARLPmZnOZeFCipdCpUKquCuTyWkYaZKSZKNP74C6-uDn579k7KzrO2LsBqFABHe7Wt6X5atycmEKAUIWCOC0NO6EZWiERS2FMadsJpKWo4OPC5aN4ycAWKPsjN1tjlNs43fsGv4ex9h3fDv4btz3Q0vDyBP5C9Ux-ANftb4hXlLTUjf5KXkv2fneH0a6-uOcvT0-bJfP-XrzlKat84gGUq_XhkJV6-DBKUVCwQK1UI60Ma5GUTl0MlTakfUYgk9WMFipYC0KW8s5u_7NjUS0Ow6x9cPX7v-r_AGBSEuL |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICASSP49357.2023.10096379 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISBN | 1728163277 9781728163277 |
EISSN | 2379-190X |
EndPage | 5 |
ExternalDocumentID | 10096379 |
Genre | orig-research |
GrantInformation_xml | – fundername: Royal Society funderid: 10.13039/501100000288 |
GroupedDBID | 23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI JC5 M43 OCL RIE RIL RIO RNS |
ID | FETCH-LOGICAL-i1709-1a67ecbd6ca0944e240516249e6779d12b9193cb69e8a1ccabd6071b4c88128d3 |
IEDL.DBID | RIE |
IngestDate | Wed Jun 26 19:24:39 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i1709-1a67ecbd6ca0944e240516249e6779d12b9193cb69e8a1ccabd6071b4c88128d3 |
OpenAccessLink | https://doi.org/10.1109/icassp49357.2023.10096379 |
PageCount | 5 |
ParticipantIDs | ieee_primary_10096379 |
PublicationCentury | 2000 |
PublicationDate | 2023-June-4 |
PublicationDateYYYYMMDD | 2023-06-04 |
PublicationDate_xml | – month: 06 year: 2023 text: 2023-June-4 day: 04 |
PublicationDecade | 2020 |
PublicationTitle | ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
PublicationTitleAbbrev | ICASSP |
PublicationYear | 2023 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0008748 |
Score | 2.3143404 |
Snippet | For medical image semantic segmentation (MISS), Vision Transformers have emerged as strong alternatives to convolutional neural networks thanks to their... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | Convolution Convolutional codes Convolutional neural networks Convolutions Correlation Medical Image Segmentation Merging Semantic segmentation Transformers Vision Transformer |
Title | Optimizing Vision Transformers for Medical Image Segmentation |
URI | https://ieeexplore.ieee.org/document/10096379 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED7RDggWXkW8ZSRWl6Zx_BgYUGnVLlApBbFVjn1BHZoiIAu_nnOaFhgYmBxHiSyddf6-s_3dAVwlsRfeeMlVJ7dc5InhNpExpyjMuKxrO9YFofAwVffP-q4f0uTwtRYGEavLZ9gOj9VZvl-4MmyVkYcT4Y6VaUBDGb0Ua62XXa2E3oTLOonm9ah3m6ZjYeJEtUOJ8Pbq519lVCoUGez8c_xdaH3r8dh4jTR7sIHFPmz_SCV4ADcP5Pvz2Sd12FOlF2eTFSclhseoZfWhDBvNaRFhKb7Ma-FR0YLHQX_SG_K6NAKfRSoUMLBSocu8dJbiM4GEy0kkKZRCqZTxUTczxMxcJg1qG9Es0adEJjLhNCG69vEhNItFgUfApMm1DTgmEiWQohHfNZjn3uRIvpzrY2gFS0xfl9kvpisjnPzx_hS2gr2r61TiDJofbyWeQ-PdlxfVhH0BlEWV-A |
link.rule.ids | 310,311,782,786,791,792,798,23941,23942,25151,27936,54770 |
linkProvider | IEEE |
linkToHtml | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED7RIvFYeBXxxkisKU3i-DEwoNKqFaVUSkFslRNfUIemCOjCr-ecpgUGBibHUSJZZ52_72x_dwCXUWi51VZ4spEZj2eR9kwkQo-iMJ0mgWmY1AmFO7HsP6vblkuT4y21MIhYXD7DunsszvLtNJ25rTLycCLcodQVWI24lI25XGu58CrJ1RpclGk0r7rNmzgecB1Gsu6KhNcXv_8qpFLgSHvrnyPYhtq3Io8NllizAyuY78Lmj2SCe3D9QN4_GX9Shz0VinE2XLBS4niMWlYey7DuhJYRFuPLpJQe5TV4bLeGzY5XFkfwxr50JQyMkJgmVqSGIjSOhMyRLyiYQiGltn6QaOJmaSI0KuPTPNGnRCcSnirCdGXDfajm0xwPgAmdKeOQjEeSI8UjNtCYZVZnSN6cqUOoOUuMXuf5L0YLIxz98f4c1jvD-96o1-3fHcOGs31xuYqfQPXjbYanUHm3s7Ni8r4AolGZQw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=ICASSP+2023+-+2023+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%28ICASSP%29&rft.atitle=Optimizing+Vision+Transformers+for+Medical+Image+Segmentation&rft.au=Liu%2C+Qianying&rft.au=Kaul%2C+Chaitanya&rft.au=Wang%2C+Jun&rft.au=Anagnostopoulos%2C+Christos&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10096379&rft.externalDocID=10096379 |