Causal Gene Identification Using Non-Linear Regression-Based Independence Tests

With the development of biomedical techniques in the past decades, causal gene identification has become one of the most promising applications in human genome-based business, which can help doctors to evaluate the risk of certain genetic diseases and provide further treatment recommendations for po...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE/ACM transactions on computational biology and bioinformatics Vol. 20; no. 1; pp. 185 - 195
Main Authors:	Zhang, Hao, Yan, Chuanxu, Xia, Yewei, Guan, Jihong, Zhou, Shuigeng
Format:	Journal Article
Language:	English
Published:	United States IEEE 01-01-2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Additive noise Algorithms Bioinformatics Cancer causal gene identification causal inference Disease Diseases Experiments Feature extraction Gene expression Genes Genetic disorders Genomes Health risks Humans Inference Machine Learning markov equivalence class Markov processes Neoplasms - genetics Neoplasms - metabolism Regression analysis Search methods Testing
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	With the development of biomedical techniques in the past decades, causal gene identification has become one of the most promising applications in human genome-based business, which can help doctors to evaluate the risk of certain genetic diseases and provide further treatment recommendations for potential patients. When no controlled experiments can be applied, machine learning techniques like causal inference-based methods are generally used to identify causal genes. Unfortunately, most of the existing methods detect disease-related genes by ranking-based strategies or feature selection techniques, which generally return a superset of the corresponding real causal genes. There are also some causal inference-based methods that can identify a part of real causal genes from those supersets, but they are just able to return a few causal genes. This is contrary to our knowledge, as many results from controlled experiments have demonstrated that a certain disease, especially cancer, is usually related to dozens or hundreds of genes. In this work, we present an effective approach for identifying causal genes from gene expression data by using a new search strategy based on non-linear regression-based independence tests, which is able to greatly reduce the search space, and simultaneously establish the causal relationships from the candidate genes to the disease variable. Extensive experiments on real-world cancer datasets show that our method is superior to the existing causal inference-based methods in three aspects: 1) our method can identify dozens of causal genes, and <inline-formula><tex-math notation="LaTeX">1/3 \sim 1/2</tex-math> <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>3</mml:mn><mml:mo>∼</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3149864.gif"/> </inline-formula> of the discovered causal genes can be verified by existing works that they are really directly related to the corresponding disease; 2) The discovered causal genes are able to distinguish the status or disease subtype of the target patient; 3) Most of the discovered causal genes are closely relevant to the disease variable.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1545-5963 1557-9964
DOI:	10.1109/TCBB.2022.3149864