Two-stream vision transformer based multi-label recognition for TCM prescriptions construction

Traditional Chinese medicine (TCM) observation diagnosis images (including facial and tongue images) provide essential human body information, holding significant importance in clinical medicine for diagnosis and treatment. TCM prescriptions, known for their simplicity, non-invasiveness, and low sid...

Full description

Saved in:
Bibliographic Details
Published in:Computers in biology and medicine Vol. 170; p. 107920
Main Authors: Zhao, Zijuan, Qiang, Yan, Yang, Fenghao, Hou, Xiao, Zhao, Juanjuan, Song, Kai
Format: Journal Article
Language:English
Published: United States Elsevier Ltd 01-03-2024
Elsevier Limited
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Traditional Chinese medicine (TCM) observation diagnosis images (including facial and tongue images) provide essential human body information, holding significant importance in clinical medicine for diagnosis and treatment. TCM prescriptions, known for their simplicity, non-invasiveness, and low side effects, have been widely applied worldwide. Exploring automated herbal prescription construction based on visual diagnosis holds vital value in delving into the correlation between external features and herbal prescriptions and offering medical services in mobile healthcare systems. To effectively integrate multi-perspective visual diagnosis images and automate prescription construction, this study proposes a multi-herb recommendation framework based on Visual Transformer and multi-label classification. The framework comprises three key components: image encoder, label embedding module, and cross-modal fusion classification module. The image encoder employs a dual-stream Visual Transformer to learn dependencies between different regions of input images, capturing both local and global features. The label embedding module utilizes Graph Convolutional Networks to capture associations between diverse herbal labels. Finally, two Multi-Modal Factorized Bilinear modules are introduced as effective components to fuse cross-modal vectors, creating an end-to-end multi-label image-herb recommendation model. Through experimentation with real facial and tongue images and generating prescription data closely resembling real samples. The precision is 50.06 %, the recall rate is 48.33 %, and the F1-score is 49.18 %. This study validates the feasibility of automated herbal prescription construction from the perspective of visual diagnosis. Simultaneously, it provides valuable insights for constructing herbal prescriptions automatically from more physical information. •A multi-label learning framework based on Visual Transformers and Graph Convolutional Networks for TCM herbal recommendation and prescription generation is proposed.•By integrating the knowledge distillation strategy into the ViT framework to ensure the consistency between image and label embeddings.•The MFB component is improved to fuse the image representations generated by the ViT and the label co-occurrence embeddings produced by the GCN module.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0010-4825
1879-0534
DOI:10.1016/j.compbiomed.2024.107920