3. General Statistical Approach

This guidance takes a broad view of groundwater monitoring and compliance. Not every site undergoes the same project life cycle stages (see Section 1.5) or is governed by the same regulations, but groundwater monitoring at every site provides data for statistical analysis that can help support decision making. Some of the statistical approaches in this document might only be applied at larger sites with extensive data sets. Others can be used at even the smallest of sites, assuming that a reasonable minimum number of measurements are collected.

Throughout the project life cycle, systematic planning should form the basis for collection and analysis of groundwater data. One of the first steps for any site is to establish a working conceptual site model (CSM)A living collection of information about a site which considers factors such as environmental and land use plans, site-specific chemical and geologic conditions, and the regulatory environment (ITRC 2007b).. The CSM is updated during the project as new information is gathered. In addition, the project planning team defines the data quality objectivesThe qualitative and quantitative statements derived for the DQO process that clarifies the study’s technical and quality objectives, defines the appropriate type of data, and specifies tolerable levels of potential decision errors that will be used as the basis for establishing the quality and quantity (USEPA 2002b). (DQOs) and then determines the appropriate type and quality of data needed to answer questions of interest. From a statistical standpoint, exploratory data analysis (EDA)An approach for initial data evaluation using graphical methods to open-mindedly explore the underlying structure and model of a dataset to aid in selection of the best statistical methods. Typical techniques are box plots, time series plots, histograms, and scatter plots (Tukey 1977; NIST/SEMATECH 2012; Unified Guidance). should generally be used to review data quality and select appropriate statistical methods.

3.1 Introduction to Conceptual Site Models

Developing a statistical approach based on a CSMconceptual site model is an initial investment that can save significant time and money and prevent poor decisions. The site CSM should be developed before deciding on the statistical methods to be used. A CSM is developed using site information such as information about sources, geology, hydrogeology, land use, and soil and groundwater data generated at sampling points from different locations on a site. Groundwater sampling points (or sampling locations) are most often monitoring wells, but could be other types of sampling such as direct push sampling, temporary probes, or field sensors. Locations may be upgradient of a release and plume, downgradient, side gradient, or within a groundwater plume. For some sites it is important to determine whether intrawellComparison of measurements over time at one monitoring well (Unified Guidance). or interwell statistical testingStatistical analyses of data collected from different monitoring wells (Unified Guidance). will be useful. An example is presented below to illustrate comparing intrawell and interwell statistical testing.

3.2 Developing a Conceptual Site Model

An initial CSMconceptual site model is essential to formulating a statistical approach as well as to deciding which data and analyses are appropriate for the current stage of the project.

3.2.1 Target Population

For statistical purposes, knowledge about a site’s hydrogeology and its CSM is critical for determining the nature and stability of the target population of groundwater measurements. In a highly stable, homogeneous, sandy geologic environment, groundwater concentrations may be fairly consistent over time. In a highly fractured or karst environment, significant discontinuities may exist in concentrations, even at nearby wells. For all sites, changes over time in regional conditions (such as a multi-year drought) may cause groundwater concentrations to change so much that past data may not be similar to more recent measurements. In that case, more than one target population may exist, with the newer population no longer being statistically the same as the older population, even though collected from the same site.

Understanding Target Populations

Perhaps the best way to make sense of sampling data is to understand the target population of measurements from which those data are drawn. For instance, an aquifer system may be conceptualized as a complex, dynamic four-dimensional object, with three dimensions representing groundwater subsurface volume over a prescribed boundary and depth, and one representing time. Physical groundwater samples are collected at specific locations and depths within the three-dimensional volume, but also at certain points in time. The target population associated with a given set of measurements could represent the entire history of the aquifer, but more commonly the goal is to say something about a specific time period or a specific hydrostratigraphic unit or layer, for instance, shallow zone circa 2013 or the local aquifer surrounding well A-1 over the past two years. Any statistical conclusion (or inference) drawn from the data only applies to the target population, so defining or understanding that target is of prime importance.

3.2.2 Background Concentrations

For compliance purposes, project managers must determine what portion of the subsurface (regarded in four dimensions) adequately represents background concentrations in order to answer the following questions:

Answers to these questions will facilitate good decisions as the project progresses and can ultimately reduce costs and avoid delays.

 

Given the dynamic nature of the subsurface, measuring background at a single point in space and time is generally inadequate. Background measured at a single time gives no indication of whether background conditions might change in the future. A single background sampling point confounds spatial variability and actual contamination. Multiple background sampling points allow for (1) assessment of the presence of significant spatial variation; (2) faster accumulation of adequate background data for statistical purposes; and (3) a better understanding of the uncontaminated subsurface.

3.2.3 Multiple Source Areas

A good CSMconceptual site model is critical to statistical evaluations of overlapping plumes or multiple contaminant source areas. Questions relevant to these situations include:

Again, a statistical approach based on the CSM and the answers it provides helps to ensure that the collected data are useful for making compliance decisions.

3.2.4 Monitored Natural Attenuation

A sound statistical approach can also help support the common remedy of monitored natural attenuation (MNA). Groundwater monitoring data at a sampling point can be initially tested for a statistically significant trend to determine whether a MNA remedy is or may be effective. However, consistent groundwater flow paths are essential, as are monitoring wells that accurately capture those paths. Later the groundwater monitoring data at a sampling point may be tested for a stabilizing trend. The amount of groundwater data needed will depend on the level of statistical confidenceLikelihood that a range of values will contain the population parameter of interest (NIST/SEMATECH 2012). required for detecting temporal trends and for deciding whether concentrations are projected to remain below criteria. Additionally, if monitoring is scheduled to continue indefinitely, the sampling frequency can be optimized statistically, but this will again require input from the CSMconceptual site model as well as the regulatory drivers governing the remedy.

3.3 Understanding the Data

Before conducting formal statistical evaluations, review the data. This review should include (1) reviewing data quality, (2) assessing the extent and usefulness of any historical data, and (3) exploring the data for general patterns and characteristics. One general way to aid in this understanding is through a collection of numerical and graphical statistical techniques known as exploratory data analysis (EDA, see Section 3.3.3). EDA can help to identify any data quality problems (such as anomalies or inconsistencies) as well as basic attributes of the data, such as its shape, spread (for example, standard deviation), and central tendency (for example, mean, medianThe 50th percentile of an ordered set of samples (Unified Guidance).).

3.3.1 Data Quality

3.3.2 Historical Data

3.3.3 Exploratory Data Analysis

3.4 Common Statistical Assumptions

Many assumptions are made during a groundwater investigation or in the course of long-term monitoring and compliance. This document focuses only on assumptions relevant to groundwater statistics and also assumes that the general principles of a systematic planning process have been followed during data collection and analysis, and the data are generally appropriate for the intended use (except perhaps for historical data).

3.4.1 Nonrandom Sampling Points and Sampling Times

3.4.2 Nondetects and Uncertain Measurements

3.4.3 Normality

3.4.4 Temporal Independence

3.4.5 Outliers, Identically Distributed Measurements

3.4.6 Temporal Stability

3.5 Testing Assumptions

EDAexploratory data analysis is described in Section 3.3.3 and is typically the first step in understanding data at a site and in helping to check the assumptions listed in Section 3.4. This section provides some guidance on how to implement EDA for testing statistical assumptions. Appendix F includes further information about checking the underlying assumptions of statistical tests. Effective EDA requires a decision logic or statistical process to sort through the decisions leading to a particular statistical design (see Section 3.6). The EDA process for each site will be different, but a general outline might include the following:

3.6 Statistical Design Considerations

Statistics play a crucial role in properly evaluating groundwater throughout the project life cycle. Therefore statistical design, which is the intentional planning for statistical analysis and data collection, should always occur at the beginning of the project rather than the end. Ideally, statistical design should occur as part of a systematic planning process in the context of the project’s DQOsdata quality objectives and DQAdata quality assessment process. To link this process more specifically to groundwater analysis, consider the following questions.

3.6.1 How good are my decisions?

3.6.2 What are site-wide false positive rates and power curves?

3.6.3 How much usable data do I have or need?

3.6.4 What are the critical contaminants?

3.6.5 Should I use interwell or intrawell sampling?

3.6.6 Should I retest and how?

3.6.7 Does my monitoring network need to be optimized?

3.6.8 Is geostatistical or spatial analysis of groundwater necessary?

3.6.9 Can I use field screening or the Triad approach?

Publication Date: December 2013

Permission is granted to refer to or quote from this publication with the customary acknowledgment of the source (see suggested citation and disclaimer).

 

This web site is owned by ITRC.

1250 H Street, NW • Suite 850 • Washington, DC 20005

(202) 266-4933 • Email: [email protected]

Terms of Service, Privacy Policy, and Usage Policy

 

ITRC is sponsored by the Environmental Council of the States.