Information Theory and Multivariate Techniques for Analyzing DNA Sequence Data: An Example from Tomato Genes
DNA and amino acid sequences are alphabetic symbols having no underlying metric. Use of information theory is one of the solutions for sequence metric problems. The reflection of DNA sequence complexity in phenotype stability might be useful for crop improvement. Shannon-Weaver index (Shannon Entrop...
Saved in:
Published in: | Nepal journal of biotechnology Vol. 1; no. 1; pp. 1 - 8 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
Biotechnology Society of Nepal
01-10-2011
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | DNA and amino acid sequences are alphabetic symbols having no underlying metric. Use of information theory is one of the solutions for sequence metric problems. The reflection of DNA sequence complexity in phenotype stability might be useful for crop improvement. Shannon-Weaver index (Shannon Entropy, H') and mutual information (MI) index were estimated from DNA sequences of 22 genes, consisted of two gene families of tomato, namely disease resistance and fruit quality. Main objective was use of information theory and multivariate techniques to understand diversity among genes and relate the sequence complexity with phenotypes. The normalized H' value ranged from 0.429 to 0.461. The highest diversity was observed in the gene Crtr-B (beta carotene hydroxylase). Two principal components which accounted for 36.65% variation placed these genes into four groups. Groupings of these genes by both principal component and cluster analyses showed clearly the similarity at phenotypes levels within cluster. Sequences similarity among genes was observed within a family. Diversity assessment of genes applying information theory should link to understand the sequences complexity with respect to gene stability for example stability of resistance gene.Key words: Diversity analysis; DNA sequences; principal component analysis; tomato genesNepal Journal of Biotechnology, 2011, Vol. 1, No. 1 pp.1-9 |
---|---|
ISSN: | 2091-1130 2467-9313 |
DOI: | 10.3126/njb.v1i1.3867 |