Exact probability of fixed patterns occurring in a random sequence

We derive a procedure to obtain the exact probability that a specific pattern of letters occurs in a longer random sequence of letters. The procedure is generalized to find the exact probability of a fixed (specific) single pattern, and a union or intersection of multiple fixed (specific) patterns w...

Full description

Saved in:
Bibliographic Details
Published in:Communications in statistics. Simulation and computation Vol. 51; no. 9; pp. 4867 - 4882
Main Authors: Sheng, Ke-Ning, Naus, Joseph I.
Format: Journal Article
Language:English
Published: Philadelphia Taylor & Francis 27-09-2022
Taylor & Francis Ltd
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We derive a procedure to obtain the exact probability that a specific pattern of letters occurs in a longer random sequence of letters. The procedure is generalized to find the exact probability of a fixed (specific) single pattern, and a union or intersection of multiple fixed (specific) patterns within a random sequence perfectly for any distributions of a cell in the random sequence, and can handle patterns with uncertain letters (including missing, blank, unclear, ambiguous, transposition, etc.). The procedure also finds the probability that a pattern that is randomly picked will appear in a separate longer random sequence of letters. These methods are of particular applicability in genetic sequence analysis, diagnostics, anthropology, clinical medicine, data mining, computational molecular biology, and pattern analysis and recognition.
ISSN:0361-0918
1532-4141
DOI:10.1080/03610918.2020.1766500