Optimal estimator of hypothesis probability for data mining problems with small samples

The paper presents a new (to the best of the authors’ knowledge) estimator of probability called the “Eph √ 2 completeness estimator” along with a theoretical derivation of its optimality. The estimator is especially suitable for a small number of sample items, which is the feature of many real prob...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of applied mathematics and computer science Vol. 22; no. 3; pp. 629 - 645
Main Authors:	Piegat Andrzej, Landowski Marek
Format:	Journal Article
Language:	English
Published:	Sciendo 01-09-2012
Subjects:	completeness interpretation of probability frequency interpretation of probability probability probability estimation single-case problem uncertainty theory
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The paper presents a new (to the best of the authors’ knowledge) estimator of probability called the “Eph √ 2 completeness estimator” along with a theoretical derivation of its optimality. The estimator is especially suitable for a small number of sample items, which is the feature of many real problems characterized by data insufficiency. The control parameter of the estimator is not assumed in an a priori, subjective way, but was determined on the basis of an optimization criterion (the least absolute errors).The estimator was compared with the universally used frequency estimator of probability and with Cestnik’s m-estimator with respect to accuracy. The comparison was realized both theoretically and experimentally. The results show the superiority of the Eph √ 2 completeness estimator over the frequency estimator for the probability interval ph ∈ (0.1, 0.9). The frequency estimator is better for ph ∈ [0, 0.1] and ph ∈ [0.9, 1].
ISSN:	2083-8492
DOI:	10.2478/v10006-012-0048-z