Information Theory and Multivariate Techniques for Analyzing DNA Sequence Data: An Example from Tomato Genes

DNA and amino acid sequences are alphabetic symbols having no underlying metric. Use of information theory is one of the solutions for sequence metric problems. The reflection of DNA sequence complexity in phenotype stability might be useful for crop improvement. Shannon-Weaver index (Shannon Entrop...

Full description

Saved in:
Bibliographic Details
Published in:Nepal journal of biotechnology Vol. 1; no. 1; pp. 1 - 8
Main Authors: Joshi, Bal K, Panthee, Dilip R
Format: Journal Article
Language:English
Published: Biotechnology Society of Nepal 01-10-2011
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:DNA and amino acid sequences are alphabetic symbols having no underlying metric. Use of information theory is one of the solutions for sequence metric problems. The reflection of DNA sequence complexity in phenotype stability might be useful for crop improvement. Shannon-Weaver index (Shannon Entropy, H') and mutual information (MI) index were estimated from DNA sequences of 22 genes, consisted of two gene families of tomato, namely disease resistance and fruit quality. Main objective was use of information theory and multivariate techniques to understand diversity among genes and relate the sequence complexity with phenotypes. The normalized H' value ranged from 0.429 to 0.461. The highest diversity was observed in the gene Crtr-B (beta carotene hydroxylase). Two principal components which accounted for 36.65% variation placed these genes into four groups. Groupings of these genes by both principal component and cluster analyses showed clearly the similarity at phenotypes levels within cluster. Sequences similarity among genes was observed within a family. Diversity assessment of genes applying information theory should link to understand the sequences complexity with respect to gene stability for example stability of resistance gene.Key words: Diversity analysis; DNA sequences; principal component analysis; tomato genesNepal Journal of Biotechnology, 2011, Vol. 1, No. 1 pp.1-9
ISSN:2091-1130
2467-9313
DOI:10.3126/njb.v1i1.3867