Optimizing Vision Transformers for Medical Image Segmentation

For medical image semantic segmentation (MISS), Vision Transformers have emerged as strong alternatives to convolutional neural networks thanks to their inherent ability to capture long-range correlations. However, existing research uses off-the-shelf vision Transformer blocks based on linear projec...

Full description

Saved in:
Bibliographic Details
Published in:ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1 - 5
Main Authors: Liu, Qianying, Kaul, Chaitanya, Wang, Jun, Anagnostopoulos, Christos, Murray-Smith, Roderick, Deligianni, Fani
Format: Conference Proceeding
Language:English
Published: IEEE 04-06-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract For medical image semantic segmentation (MISS), Vision Transformers have emerged as strong alternatives to convolutional neural networks thanks to their inherent ability to capture long-range correlations. However, existing research uses off-the-shelf vision Transformer blocks based on linear projections and feature processing which lack spatial and local context to refine organ boundaries. Furthermore, Transformers do not generalize well on small medical imaging datasets and rely on large-scale pre-training due to limited inductive biases. To address these problems, we demonstrate the design of a compact and accurate Transformer network for MISS, CS-Unet, which introduces convolutions in a multi-stage design for hierarchically enhancing spatial and local modeling ability of Transformers. This is mainly achieved by our well-designed Convolutional Swin Transformer (CST) block which merges convolutions with Multi-Head Self-Attention and Feed-Forward Networks for providing inherent localized spatial context and inductive biases. Experiments demonstrate CS-Unet without pre-training out- performs other counterparts by large margins on multi-organ and cardiac datasets with fewer parameters and achieves state-of-the-art performance. Our code is available at Github 1 .
AbstractList For medical image semantic segmentation (MISS), Vision Transformers have emerged as strong alternatives to convolutional neural networks thanks to their inherent ability to capture long-range correlations. However, existing research uses off-the-shelf vision Transformer blocks based on linear projections and feature processing which lack spatial and local context to refine organ boundaries. Furthermore, Transformers do not generalize well on small medical imaging datasets and rely on large-scale pre-training due to limited inductive biases. To address these problems, we demonstrate the design of a compact and accurate Transformer network for MISS, CS-Unet, which introduces convolutions in a multi-stage design for hierarchically enhancing spatial and local modeling ability of Transformers. This is mainly achieved by our well-designed Convolutional Swin Transformer (CST) block which merges convolutions with Multi-Head Self-Attention and Feed-Forward Networks for providing inherent localized spatial context and inductive biases. Experiments demonstrate CS-Unet without pre-training out- performs other counterparts by large margins on multi-organ and cardiac datasets with fewer parameters and achieves state-of-the-art performance. Our code is available at Github 1 .
Author Deligianni, Fani
Liu, Qianying
Kaul, Chaitanya
Wang, Jun
Murray-Smith, Roderick
Anagnostopoulos, Christos
Author_xml – sequence: 1
  givenname: Qianying
  surname: Liu
  fullname: Liu, Qianying
  organization: University of Glasgow,School of Computing Science
– sequence: 2
  givenname: Chaitanya
  surname: Kaul
  fullname: Kaul, Chaitanya
  organization: University of Glasgow,School of Computing Science
– sequence: 3
  givenname: Jun
  surname: Wang
  fullname: Wang, Jun
  organization: University of Warwick
– sequence: 4
  givenname: Christos
  surname: Anagnostopoulos
  fullname: Anagnostopoulos, Christos
  organization: University of Glasgow,School of Computing Science
– sequence: 5
  givenname: Roderick
  surname: Murray-Smith
  fullname: Murray-Smith, Roderick
  organization: University of Glasgow,School of Computing Science
– sequence: 6
  givenname: Fani
  surname: Deligianni
  fullname: Deligianni, Fani
  organization: University of Glasgow,School of Computing Science
BookMark eNo1j8tKw0AUhkdRsKm-gYvxARLPmZnOZeFCipdCpUKquCuTyWkYaZKSZKNP74C6-uDn579k7KzrO2LsBqFABHe7Wt6X5atycmEKAUIWCOC0NO6EZWiERS2FMadsJpKWo4OPC5aN4ycAWKPsjN1tjlNs43fsGv4ex9h3fDv4btz3Q0vDyBP5C9Ux-ANftb4hXlLTUjf5KXkv2fneH0a6-uOcvT0-bJfP-XrzlKat84gGUq_XhkJV6-DBKUVCwQK1UI60Ma5GUTl0MlTakfUYgk9WMFipYC0KW8s5u_7NjUS0Ow6x9cPX7v-r_AGBSEuL
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP49357.2023.10096379
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 1728163277
9781728163277
EISSN 2379-190X
EndPage 5
ExternalDocumentID 10096379
Genre orig-research
GrantInformation_xml – fundername: Royal Society
  funderid: 10.13039/501100000288
GroupedDBID 23M
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
JC5
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i1709-1a67ecbd6ca0944e240516249e6779d12b9193cb69e8a1ccabd6071b4c88128d3
IEDL.DBID RIE
IngestDate Wed Jun 26 19:24:39 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i1709-1a67ecbd6ca0944e240516249e6779d12b9193cb69e8a1ccabd6071b4c88128d3
OpenAccessLink https://doi.org/10.1109/icassp49357.2023.10096379
PageCount 5
ParticipantIDs ieee_primary_10096379
PublicationCentury 2000
PublicationDate 2023-June-4
PublicationDateYYYYMMDD 2023-06-04
PublicationDate_xml – month: 06
  year: 2023
  text: 2023-June-4
  day: 04
PublicationDecade 2020
PublicationTitle ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublicationTitleAbbrev ICASSP
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
Score 2.3143404
Snippet For medical image semantic segmentation (MISS), Vision Transformers have emerged as strong alternatives to convolutional neural networks thanks to their...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Convolution
Convolutional codes
Convolutional neural networks
Convolutions
Correlation
Medical Image Segmentation
Merging
Semantic segmentation
Transformers
Vision Transformer
Title Optimizing Vision Transformers for Medical Image Segmentation
URI https://ieeexplore.ieee.org/document/10096379
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED7RDggWXkW8ZSRWl6Zx_BgYUGnVLlApBbFVjn1BHZoiIAu_nnOaFhgYmBxHiSyddf6-s_3dAVwlsRfeeMlVJ7dc5InhNpExpyjMuKxrO9YFofAwVffP-q4f0uTwtRYGEavLZ9gOj9VZvl-4MmyVkYcT4Y6VaUBDGb0Ua62XXa2E3oTLOonm9ah3m6ZjYeJEtUOJ8Pbq519lVCoUGez8c_xdaH3r8dh4jTR7sIHFPmz_SCV4ADcP5Pvz2Sd12FOlF2eTFSclhseoZfWhDBvNaRFhKb7Ma-FR0YLHQX_SG_K6NAKfRSoUMLBSocu8dJbiM4GEy0kkKZRCqZTxUTczxMxcJg1qG9Es0adEJjLhNCG69vEhNItFgUfApMm1DTgmEiWQohHfNZjn3uRIvpzrY2gFS0xfl9kvpisjnPzx_hS2gr2r61TiDJofbyWeQ-PdlxfVhH0BlEWV-A
link.rule.ids 310,311,782,786,791,792,798,23941,23942,25151,27936,54770
linkProvider IEEE
linkToHtml http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED7RIvFYeBXxxkisKU3i-DEwoNKqFaVUSkFslRNfUIemCOjCr-ecpgUGBibHUSJZZ52_72x_dwCXUWi51VZ4spEZj2eR9kwkQo-iMJ0mgWmY1AmFO7HsP6vblkuT4y21MIhYXD7DunsszvLtNJ25rTLycCLcodQVWI24lI25XGu58CrJ1RpclGk0r7rNmzgecB1Gsu6KhNcXv_8qpFLgSHvrnyPYhtq3Io8NllizAyuY78Lmj2SCe3D9QN4_GX9Shz0VinE2XLBS4niMWlYey7DuhJYRFuPLpJQe5TV4bLeGzY5XFkfwxr50JQyMkJgmVqSGIjSOhMyRLyiYQiGltn6QaOJmaSI0KuPTPNGnRCcSnirCdGXDfajm0xwPgAmdKeOQjEeSI8UjNtCYZVZnSN6cqUOoOUuMXuf5L0YLIxz98f4c1jvD-96o1-3fHcOGs31xuYqfQPXjbYanUHm3s7Ni8r4AolGZQw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=ICASSP+2023+-+2023+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%28ICASSP%29&rft.atitle=Optimizing+Vision+Transformers+for+Medical+Image+Segmentation&rft.au=Liu%2C+Qianying&rft.au=Kaul%2C+Chaitanya&rft.au=Wang%2C+Jun&rft.au=Anagnostopoulos%2C+Christos&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10096379&rft.externalDocID=10096379