Assessing transmission attribution risk from simulated sequencing data in HIV molecular epidemiology

HIV molecular epidemiology (ME) is the analysis of sequence data together with individual-level clinical, demographic, and behavioral data to understand HIV epidemiology. The use of ME has raised concerns regarding identification of the putative source in direct transmission events. This could resul...

Full description

Saved in:
Bibliographic Details
Published in:AIDS (London) Vol. 38; no. 6; pp. 865 - 873
Main Authors: Nascimento, Fabrícia F, Mehta, Sanjay R, Little, Susan J, Volz, Erik M
Format: Journal Article
Language:English
Published: England Lippincott Williams & Wilkins 01-05-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:HIV molecular epidemiology (ME) is the analysis of sequence data together with individual-level clinical, demographic, and behavioral data to understand HIV epidemiology. The use of ME has raised concerns regarding identification of the putative source in direct transmission events. This could result in harm ranging from stigma to criminal prosecution in some jurisdictions. Here we assessed the risks of ME using simulated HIV genetic sequencing data. We simulated social networks of men-who-have-sex-with-men, calibrating the simulations to data from San Diego. We used these networks to simulate consensus and next-generation sequence (NGS) data to evaluate the risks of identifying direct transmissions using different HIV sequence lengths, and population sampling depths. To identify the source of transmissions, we calculated infector probability and used phyloscanner software for the analysis of consensus and NGS data, respectively. Consensus sequence analyses showed that the risk of correctly inferring the source (direct transmission) within identified transmission pairs was very small and independent of sampling depth. Alternatively, NGS analyses showed that identification of the source of a transmission was very accurate, but only for 6.5% of inferred pairs. False positive transmissions were also observed, where one or more unobserved intermediaries were present when compared to the true network. Source attribution using consensus sequences rarely infers direct transmission pairs with high confidence but is still useful for population studies. In contrast, source attribution using NGS data was much more accurate in identifying direct transmission pairs, but for only a small percentage of transmission pairs analyzed.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0269-9370
1473-5571
1473-5571
DOI:10.1097/QAD.0000000000003820