Water Quality Data Gap Filling Methods for Environmental Consulting: Non-Detects, Kaplan-Meier, and ROS

Environmental data is almost never complete. A percentage of results come back as non-detects (U-flagged, reported as less than the reporting limit), some samples are lost in the lab, and some events are missed entirely. How you handle these gaps materially changes the exceedance table, the trend analysis, and any statistical comparison against a background or a standard. Doing it wrong produces indefensible results.

This guide covers the water quality data gap filling methods most commonly used in environmental consulting — what each method does, when to use it, and when it will steer you wrong. The methods are grounded in EPA guidance and USEPA’s ProUCL documentation, not generic statistical folklore.

Types of Data Gaps

Before picking a method, classify the gap. The right approach depends on what’s missing:

Left-censored data (non-detects) — the most common case. The result is below the reporting limit (RL); the true value is somewhere between zero and the RL.
Missing at random — a sample was lost, a bottle broke, or the lab couldn’t analyze for an unrelated reason. The absence is unrelated to the value.
Missing not at random — sampling was skipped because the location was known to be dry, contaminated, or inaccessible. The absence is correlated with the value.
Rejected data (R-flagged) — results exist but failed validation. Treat as missing, not as zero.

Only left-censored and missing-at-random data have defensible gap-filling methods. For missing-not-at-random data, the defensible action is to document the gap and its cause, not to estimate a value.

Non-Detect Substitution Methods (and Their Tradeoffs)

The historical Excel approach is “substitute the non-detect with some fraction of the RL.” This works for preliminary screening but biases summary statistics and is no longer considered defensible for anything beyond a simple maximum-value exceedance check.

Simple Substitution

Replace each non-detect with 0, RL/2, or RL:

Substitution Rule $$\text{value}_{ND} = \{0, \; RL/2, \; RL\}$$

When it’s acceptable: screening-level comparison where only the maximum value matters. If every non-detect’s RL is already below the applicable standard, the choice of substitution doesn’t affect the exceedance determination.
When it fails: calculating means, standard deviations, or confidence intervals. Substitution inflates or deflates the mean depending on the fraction used, and always biases variance.
Detection limit inadequacy: if any RL is above the applicable standard, substitution cannot prove compliance. Flag the location as “cannot determine compliance” and request re-analysis at a lower detection limit.

Kaplan-Meier (Survival Analysis)

Originally a biostatistics method for censored survival times, Kaplan-Meier treats non-detects as left-censored observations and estimates the empirical cumulative distribution function. It does not assume any parametric distribution.

When to use: datasets with multiple reporting limits (common with method changes, dilution variations, or different labs); distribution is unknown or non-normal.
What you get: unbiased estimates of the mean, median, percentiles, and standard deviation.
Limits: requires that the censoring mechanism be random with respect to the true value (generally true for analytical chemistry).
Tooling: USEPA ProUCL implements Kaplan-Meier. R’s survfit and NADA packages also support it.

Regression on Order Statistics (ROS)

ROS fits a distribution (typically lognormal) to the detected values, then uses the fitted distribution to estimate values for the non-detects at their rank-ordered positions.

When to use: modest non-detect fractions (roughly 10–50%); a reasonable parametric distribution can be assumed.
What you get: estimated individual values that can be used for further calculations, plus unbiased summary statistics.
Limits: performs poorly when non-detect fraction exceeds 50% or when the fitted distribution is a bad match for the detected values.
Tooling: ProUCL implements robust ROS. The R NADA package has ros().

Maximum Likelihood Estimation (MLE)

MLE fits a parametric distribution (usually lognormal) to the combined detected and censored data by maximizing the joint likelihood.

When to use: you have strong evidence the data follow a specific parametric distribution and the sample size is large (~20+).
Limits: sensitive to distributional assumptions; can give misleading results when the distribution is misspecified.

Choosing Among the Methods

EPA’s guidance and the USGS SCOUT documentation converge on a practical decision framework based on non-detect percentage:

< 15% non-detects: simple substitution with RL/2 is acceptable for screening; use Kaplan-Meier or ROS if you need defensible summary statistics.
15–50% non-detects: Kaplan-Meier is the preferred default (no distributional assumption). ROS is acceptable if the detected values clearly follow a lognormal distribution.
50–80% non-detects: Kaplan-Meier only. ROS becomes unreliable. Consider whether any statistical comparison is defensible with so little information.
> 80% non-detects: do not calculate summary statistics. Report the detection frequency, the range of RLs, and the number of detects above the standard. That’s the defensible statement.

Worked Example A quarterly groundwater monitoring dataset for trichloroethene has 24 results: 14 non-detects at RLs of 1.0 and 0.5 ug/L, and 10 detects ranging from 0.8 to 6.2 ug/L. Non-detect fraction = 14/24 = 58%. Simple substitution with RL/2 would give a biased mean; ROS is unreliable above 50% NDs. Use Kaplan-Meier in ProUCL to estimate the mean and 95% UCL for comparison against the MCL of 5 ug/L.

Handling Data Qualifiers

Non-detects are only one type of qualified data. A complete gap-handling strategy also deals with:

J-flagged (estimated): detected but below the PQL, or affected by QC issues. Use the reported value but document the qualifier. A J-result above a standard is still an exceedance, but warrants discussion of the confidence in the determination.
UJ-flagged: not detected, with an estimated detection limit. Treat as non-detect with elevated uncertainty about the RL.
R-flagged (rejected): the data are unusable. Treat the sample as missing and request re-sampling or re-analysis. Never substitute a value for rejected data.

See the groundwater monitoring data management guide for how qualifiers flow through the rest of the compliance workflow.

Filling Temporal Gaps (Missed Sampling Events)

When a scheduled sampling event was missed entirely, the options are narrow:

Document and report: under most NPDES permits, a missed sampling event is itself a permit condition violation that must be reported. Do not backfill with estimated data.
Annual / trend analyses: if a missed event leaves a seasonal gap, explicitly exclude that period from the trend analysis and note the exclusion. Do not interpolate.
Background / reference comparisons: statistical comparisons against background should use only the events that were actually sampled at both the site and the background location. Paired-sample analyses are preferred.

Defensibility Checklist

Before you finalize any gap-filled dataset, confirm:

The method chosen matches the non-detect fraction per EPA/ProUCL guidance
All reporting limits are below the applicable standard (flag detection limit inadequacy otherwise)
All qualifiers (J, U, UJ, R) are preserved in the final dataset, not dropped
The method and its rationale are documented in the report’s QA/QC section
Results are cross-checked by a second staff member before release

Authoritative references: USEPA ProUCL software and technical documentation remain the practical standard for censored environmental data; EPA’s Guidance on Environmental Data Verification and Data Validation covers the broader QA/QC framework. The right method for your dataset depends on the fraction of non-detects, the distribution of detected values, and what decision the data supports — a screening comparison is a different standard than a risk assessment.