Machine Learning-Based Artifact Detection for Long-Read Sequencing Data

A major goal of cancer diagnostics is to be able to detect the presence of cancer in the earliest stages when it is most curable. Doing this using minimally invasive methods such as liquid biopsy which analyzes tumor material shed into bodily fluids is particularly attractive. Liquid biopsy methods...

Full description

Saved in:
Bibliographic Details
Published in:2023 International Conference on Computational Science and Computational Intelligence (CSCI) pp. 582 - 584
Main Authors: Mbuga, Felix, Lam, Kathy, Lee, Wendy
Format: Conference Proceeding
Language:English
Published: IEEE 13-12-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A major goal of cancer diagnostics is to be able to detect the presence of cancer in the earliest stages when it is most curable. Doing this using minimally invasive methods such as liquid biopsy which analyzes tumor material shed into bodily fluids is particularly attractive. Liquid biopsy methods that employ next-generation sequencing (NGS) of circulating tumor DNA (ctDNA) face a challenge due to the low signal-to-noise ratio, where sequencing artifacts introduce noise at a similar level to ctDNA. To address this, we propose utilizing machine learning (ML) and deep learning (DL) techniques, leveraging Genome in a Bottle (GIAB) truth sets and data from the National Center for Biotechnology Information's Sequence Read Archive (NCBI SRA) database, to robustly identify sequencing artifacts. This approach holds promise for enhancing the accuracy and reliability of liquid biopsy-based cancer diagnostics.
ISSN:2769-5654
DOI:10.1109/CSCI62032.2023.00103