Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data—perhaps as a result of changing analyti...

Full description

Saved in:
Bibliographic Details
Published in:Computers & geosciences Vol. 33; no. 5; pp. 696 - 704
Main Authors: Lee, Lopaka, Helsel, Dennis
Format: Journal Article
Language:English
Published: Oxford Elsevier Ltd 01-05-2007
Elsevier Science
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data—perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan–Meier (K–M) method. This method has seen widespread usage in the medical sciences within a general framework termed “survival analysis” where it is employed with right-censored time-to-failure data. However, K–M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K–M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K–M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K–M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0098-3004
1873-7803
DOI:10.1016/j.cageo.2006.09.006