4. Statistical Analysis for Project Life Cycle Stages

Statistical analyses are conducted to answer the study questions listed in Table 4-1. The questions start simply and move to more complex analyses. You may need to reconsider the questions, however, as the project progresses through the life cycle stages. For example, when initially assessing a release or characterizing a site, you will likely determine backgroundNatural or baseline groundwater quality at a site that can be characterized by upgradient, historical, or sometimes cross-gradient water quality (Unified Guidance). concentrations for comparison to site concentrations. Later, you may need to revisit background concentrations when determining compliance with criteria. Some study questions are relevant through all stages of a project life cycle.

Study questions also have a relationship to one another in the context of specific groundwater evaluation objectives such as background or attenuation, which may be important during various project life cycle stages. Study Questions 1 and 2 assess background concentrations that may be of interest during release detection, site characterization, monitoring, and closure stages. Study Questions 3 and 4 assess contaminant concentrations with respect to criteria that may be important in all of the life cycle stages. Study Questions 5 and 6 evaluate temporal trends in data sets that may be of concern during remediation, monitoring, and closure. Study Questions 7 and 8 assess temporal and spatial rates of change for contaminants and are also important concerns during remediation, monitoring, and closure. Study Questions 9 and 10 evaluate if the frequency of sampling and spatial coverage of wells are appropriate, leading to a more optimal monitoring program, which is an important consideration for all life cycle stages. These final questions are used to determine whether a monitoring well network may need to be expanded for sufficient compliance point or plume migration coverage. These questions may also be used to determine whether more or less frequent sampling is necessary to characterize or evaluate changes in contaminant concentrations.

The five project life cycle stages do not cover all possible project situations, but are provided to link statistical methods and tools to the typical waste management facility or contaminated site investigation, monitoring, or cleanup actions. Many of the project life cycle stages share common groundwater evaluation objectives, where groundwater statistical methods might be used.

The discussion in this section provides the following guidance for each life cycle stage in relation to the study questions:

selecting and characterizing a data set relevant to the study question
appropriate statistical methods and tools for the study question
interpretation of the results and the associated uncertainty

The discussion in the following sections does not provide guidance on the specific assessment or remediation tasks associated with each life cycle stage (such as groundwater remediation methods) but rather provides guidance on the use of statistics to support the groundwater evaluation objectives. Additional information about common mistakes in applying statistical methods is presented in Appendix B.

The number of samples that must be collected in order to apply the statistical methods varies; however, in general, more samples better characterize groundwater concentrations. USEPA recommends a minimum of 8 to 10 independent observations (Chapter 5.2.1, Unified Guidance) for most of the statistical tests. States may require a specific number of minimum observations by rule. Data sets of 20 or more observations may be possible, and methods to expand the data set, discussed in Chapter 5.2.6, Unified Guidance, can include additional sampling or pooling data from more than one well if the data characteristics allow (for example, use an analysis of variance (ANOVA)A statistical method for identifying differences among several population means or medians. test to show concentrations do not differ) in order to statistically increase the number of samples. However, when pooling data, the statistical significanceStatistical difference exceeding a test limit large enough to account for data variability and chance (Unified Guidance). A fixed number equal to alpha (α), the false positive rate, indicating the probability of mistakenly rejecting the stated null hypothesis (H₀) in favor of the alternative hypothesis (Hᴀ). Or, the p-value sufficiently low such that the analyst will reject the null hypothesis (H₀). of a single contaminated well may be reduced by pooling with uncontaminated wells.

In addition, collecting samples that are not separated by sufficient time intervals can lead to redundant measurements. Such data are not statistically independent. Collecting samples separated by too long an interval may miss important aspects of the data record. See Study Question 9 for more information on temporal optimization methods.

Ideally, the project planning process documents the data analysis procedures, including how nondetects are managed. If not established prior to sampling then a decision should be made during EDAexploratory data analysis as to how nondetect data will be handled in the data set. Parametric statistical tests typically utilize the meanThe arithmetic average of a sample set that estimates the middle of a statistical distribution (Unified Guidance). and standard deviation of a data set, both of which may be significantly skewed by using the detection limit or other substitution methods in these calculations. Nonparametric tests are not as impacted by a small number of nondetects, but the results of almost all statistical tests can be confounded by a large number of nondetects especially if associated with varying detection limits. See Section 5.7 for guidance on how to handle nondetects in statistical analyses.

Prior to analyzing the groundwater data and after the EDA, review the assumptions of the statistical tests. For example, some of the tests require that the data be derived from a normal statistical distribution. The assumptions for each of the statistical methods are discussed in detail in Section 5. Section 3.4 provides more information on the general statistical assumptions, and Appendix F includes information about checking the underlying assumptions of statistical tests.

When discussing the concept of background, it should be clarified whether background represents natural or anthropogenic conditions as well as whether background conditions are defined by location or by a time period that is uninfluenced by the site. Few sources exist to determine the analytical qualities of pristine groundwater, so groundwater monitoring results are typically compared to the conditions in wells unaffected by the contaminants associated with the activities being monitored. For example, monitoring wells upgradient of a landfill are generally compared to monitoring wells downgradient of the same landfill. Significantly greater contaminant concentrations in the downgradient wells than the upgradient wells would support the conclusion that contaminants were released from the landfill. Most guidance documents and research papers on determining natural or anthropogenic background concentrations primarily employ statistical techniques. However, geochemical evaluations may be used to identify potentially contaminated samples and naturally occurring metals (Thorbjornsen and Myers 2007).

Background data sets can be compared to site data using a variety of statistical tests depending on the geology, well development, and chemicals being examined as well as the release mechanism identified in the conceptual site model. The number of planned background wells or groundwater sampling points should reflect the site being investigated in terms of site size, the variability of the concentrations for a chemical, and analytical detection limits. Background may also be represented by a single value based on regulatory requirements or a value derived from literature. Not all background values reported in the literature are appropriate for comparison to site concentrations. Background values obtained from literature should be reviewed to determine whether the regional geology and land use for the values are similar to conditions at the site.

In locating or selecting the background wells or data set, consider the common requirements for statistical analyses. The monitoring wells should represent the same hydrogeologic unit and not be too close together or immediately up- or downgradient of one another. In determining proximity of wells, some factors to consider include formation transmissivity, rangeThe difference between the largest value and smallest value in a dataset (NIST/SEMATECH 2012). of contaminant concentrations, and size of area being investigated. Proper well placement assures that the water samples drawn from different wells are independent such that the groundwater from the same location is not being sampled twice. A sufficient number of samples must be collected to demonstrate that the analytical results do not correlate with either the time of collection or nearby wells. There also should be a sufficient number of samples to determine whether seasonality is important. In general, if the concentration of a chemical increases or decreases over time or is significantly higher in some background wells as compared to others then the background data drawn from those wells may not be suitable for use as background.

To help avoid misrepresentation of chemicals in background groundwater, avoid placing background wells near typical, known sources of contamination such as burial areas and underground storage tanks. Collect basic geologic data to determine which sources of contamination should be of concern. The hydrogeologic characteristics such as groundwater movement, direction, volume, and stability all are important and are discussed further in Section 4.3.

The appropriate number of samples will be related to site conditions as characterized in the conceptual site model (CSM)A living collection of information about a site which considers factors such as environmental and land use plans, site-specific chemical and geologic conditions, and the regulatory environment (ITRC 2007b).; see Section 3.6 for further discussion of statistical design and number of samples.

When sample data are available from both background and potentially impacted wells, plots can be used to graphically assess whether the two data sets are derived from the same or different statistical populations. Probability plots and box plots can be useful for such qualitative evaluations.

An upper criterion can be calculated from background data as either an upper prediction limit or upper tolerance limit. Control charts can be used as an alternate graphical tool to assess whether concentrations within a well (intrawellComparison of measurements over time at one monitoring well (Unified Guidance).) have increased above a control limit based on data from an earlier (background) time period. Sample results from the potentially impacted wells are then compared to these limits to determine if a release has occurred. Prediction or tolerance limitsThe upper or lower limit of a tolerance interval (Unified Guidance). are simple to implement and communicate results. Care must also be taken to account for high site-wide false positive rates (SWFPR) when multiple contaminants and multiple wells are being compared. Multiple comparisons with a fixed level of statistical significance indicate that each repeat of the test has the same chance of incorrectly rejecting the null hypothesisOne of two mutually exclusive statements about the population from which a sample is taken, and is the initial and favored statement, H₀, in hypothesis testing (Unified Guidance). and those repeats accumulate or sum the error (increasing the risk of at least one mistaken decision).

Two-sample tests are an alternative to the prediction limit or tolerance limit. These tests compare the mean or mean rank of the potentially impacted wells with the same statistic for the background data. The parametricA statistical test that depends upon or assumes observations from a particular probability distribution or distributions (Unified Guidance). comparison tests are Welch's t-test and pooled variance t-test and the nonparametric equivalents are the Wilcoxon rank sum test and Tarone-Ware test. The t-testA t-test, or two-sample test, is a statistical comparison between two sets of data to determine if they are statistically different at a specified level of significance (Unified Guidance). is sensitive to outliers; the nonparametric tests are not sensitive to outliers.

When evaluating whether or not a release has occurred, the following methods are most applicable:

Prepare box plots or probability plots to qualitatively evaluate whether the background and potentially impacted data sets appear to be drawn from the same population.
Determine if the background samples appear to be from a single statistical population or if multiple aquifer characteristics exist (see one-way ANOVA or t-test, Wilcoxon rank sum test, Tarone-Ware), and assess whether background concentrations are stable (see parametric trend tests, Mann-Kendall trend tests).
Calculate an interval that represents a background distribution of concentrations of chemicals from one or a set of wells that are unaffected by a contaminant source or a contaminated site with a stated coverage and certainty. See prediction limits, control charts, and tolerance limits.
In some circumstances you may want to consider the following approach: use two sample tests to determine if there is a difference between potentially impacted wells and background. See t-test, ANOVA, Wilcoxon rank sum test, Tarone-Ware.

The geology, geography, and climatic conditions associated with a site strongly influence groundwater hydrology. Site hydrology describes not only the presence or absence of water, but also its flow direction, its velocity, and its volume. The location of the aquifer and its character must be well understood in order to assure that the groundwater samples with which comparisons are made are drawn from the same sample population (same aquifer).

Surrounding bodies of water and local drainage patterns also influence hydrology. For example, it is possible for a local stream to be gaining during one season (groundwater moves into the stream) and losing during another season (stream water moves into groundwater) so that samples drawn from groundwater are influenced by differing sources of water in different seasons.

The natural (background) geochemical composition of the groundwater is directly related to the aquifer mineralogy and the rate of dissolution of the aquifer minerals into the groundwater. Most of these reactions are slow and only subtle changes in inorganic concentrations occur over time within a hydrogeologic unit. Faster reactions and changes can occur locally where there are changes in aquifer mineralogy such as a sandy stream channel within a limestone unit or an influx of groundwater with a different composition than the pore water of the aquifer such as from a losing stream.

Variables such as temperature, exposed surface area of minerals, pH, oxidation-reduction potential, and presence in solution of other ions will affect not only the rate of dissolution of minerals into the groundwater but also influence the precipitation or co-precipitation of minerals from the groundwater, potentially changing the groundwater chemistry. These factors will also affect the rate of absorption (uptake into the physical structure) of ions to the aquifer material or particulate matter in the groundwater itself.

Aquifer characteristics such as the type of aquifer solids, pore structure, and fracture systems may alter the transport of dissolved chemicals by physical mechanisms such as absorption onto the aquifer matrix. Depending on the persistence of the contaminant, the same mechanisms that hinder movement may also allow the slow release of contaminants over time. Soil type (such as a clay mineral) can also affect the mass of contamination that may be released into groundwater over time by absorbing the contamination onto the minerals within the soils. These contaminants are then slowly released over time into the groundwater or infiltrating rainwater. The characteristics of pores and fractures in rock aquifers are also important controllers in the interaction of contaminants with aquifer material.

While not often emphasized, the microbiology of the subsurface can change contaminant half-lives, support biodegradation, cause changes in water chemistry, and affect contaminant movement. For example, changes in biota can alter key characteristics such as pH, free metals, and dissolved oxygen concentrations. You should understand and monitor biological parameters if microbial activity affects the goals for the site. See the ITRC Environmental Molecular Diagnostics (ITRC 2013) web-based document for information about a group of advanced and emerging techniques (referred to as EMDs) that analyze biological and chemical characteristics of environmental samples.

The nature and extent of contamination should be understood sufficiently to support appropriate groundwater monitoring. Consider the following questions when evaluating existing contamination and its sources:

How many potential sources exist?
Are all present and potential sources releasing into the groundwater being monitored?
What are the contaminants?
Is contaminant co-mingling occurring from multiple potential source areas or does it have the potential to occur?
Is a contaminant being examined against a background concentration that may include a naturally occurring element like chromium? Or is monitoring focused on a contaminant that is not present in natural background but may be present in anthropogenic background such as polychlorinated biphenyls (PCBs)? See Section 4.2.1 for a detailed discussion on background.
With the above concerns in mind, what is the extent or mass of contamination?

The pathways for transport of contamination determine whether data are being appropriately evaluated and also help to anticipate how sample results may change over time.

Different geologic units within the same aquifer can have differing water quality. For example, for a saturated clay layer underlain by sand, the contaminant concentrations in the two geologic units can be significantly different due to different transmissivity. Some contaminants, such as depleted uranium, are generally immobile but can be mobilized under certain geochemical conditions. Transport mechanisms then become crucial to understanding results for these contaminants.

Comparison of Two Aquifers

A site being monitored has both an upper and lower aquifer and the lower aquifer is not in direct contact with the area of contamination. As a result, the upper aquifer may be contaminated while the lower aquifer may be clean. Statistical comparisons of the concentrations in these aquifers could support and help to refine the evaluation of groundwater transport pathways.

4.3.3.1 Contaminant Chemical and Physical Nature

The physical and chemical characteristics of the contaminant must also be considered in groundwater monitoring. A contaminant such as benzene, a light nonaqueous phase liquid (LNAPL), will both float on the surface of groundwater and dissolve into groundwater. If LNAPL is present, groundwater samples which are drawn from the water table can differ from deeper samples that only represent the dissolved phase. Benzene is also highly mobile and may travel further in groundwater than other contaminants, even those originating from the same source area. Trichloroethene, a dense nonaqueous phase liquid (DNAPL), is heavier than water and also dissolves in water, so it may exist below downgradient monitoring wells screened at the same depths as upgradient wells. An inorganic element such as arsenic can bind tightly to clay, but can be released by changes in parameters such as pH.

In order to properly characterize a site, all of the above contaminant characteristics must be considered. While some information may not be initially known, it is important that enough information be collected to support the CSMconceptual site model (see Section 3: General Statistical Approach).

4.3.3.2 Groundwater Monitoring Networks

Groundwater monitoring networks are built around either identified potential sources of contamination (such as a landfill, spill, or release) or are designed in an effort to identify and delineate unknown sources of contamination. In either case, these monitoring well networks should ideally be developed by examining the site history and geology and considering the following questions:

What was the material released and what was its chemical composition?
Where were materials stored, handled, processed or disposed?
Can the volume of material released be estimated?
Do available records shed light on the potential pathways by which contamination may have entered the soil and groundwater, such as drainage swales, underground tanks or piping, storm water or sanitary sewer systems?

More information on considerations for designing groundwater monitoring networks can be found in Section 3.6.

Historical records and data can provide context for site characterization, but site geography, geology, and hydrogeology are critical as well. Section 3.3.2 has more information on the issues associated with using historical data. Once the available site data have been collected and assessed, the information can be used to identify the possible pathways for contamination to migrate, the sources of contamination, the possible contaminants, and then to plan the initial site characterization investigations. When information that is needed to support a CSMconceptual site model is not available in the historical records, a plan must be developed to fill the data gaps. Soil and groundwater sampling are typically needed to delineate the contamination extent, to confirm pathways, and to differentiate sources. Once the initial investigation has been performed, graphical and statistical evaluations of the data can be used to identify the extent of contamination to the degree possible with the data collected. This information can be used to develop an initial CSM. In this iterative process, graphical data analysis and statistics help to direct data collection, which is used to refine and focus the CSM.

When conducting site characterization the following statistical methods are most applicable:

Determine whether or not the mean concentration of the contaminant is increasing or decreasing over time. If concentration is plotted versus time, either linear regression (a parametric test) or Mann-Kendall and Theil-Sen (nonparametric tests) can be used to verify this trend. When comparing the trends between wells or in different time frames, you should check to see if the slopes of the two regression lines are significantly different. See linear regression, Mann-Kendall trend test, and Theil-Sen trend line.
Determine whether individual wells within a monitoring well network have a contaminant concentration greater than that expected for a certain percentile (for example, 95th or 99th percentile) of the wells in the network historically. Calculation of an upper quartile, upper tolerance limits or upper prediction limits may help identify areas of highest concentration that may warrant further characterization.
Compare a data set to a criterion. Whether that criterion is a maximum contaminant level (MCL) or similar regulatory value, first determine what the value is intended to convey. Does the criterion represent a not-to-exceed value, a mean value, or a value intended to represent a percentage of the population? Then compute an appropriate confidence intervalStatistical interval designed to bound the true value of a population parameter such as the mean or an upper percentile (Unified Guidance). on the data set to determine if the criterion has been exceeded. See parametric confidence interval, nonparametric confidence interval, and confidence interval for upper percentile.

Selection of technologies for remediation of groundwater is commonly based on evaluation criteria. The CERCLAComprehensive Environmental Response, Compensation, and Liability Act remedy selection process evaluates potential remedial alternatives using nine criteria (USEPA 1988). These CERCLA criteria are used to select the best overall remedy for the site. State cleanup regulations may also contain similar evaluation criteria that are used for selection of remedial technologies. Although statistical analyses are not always directly relevant to remedy selection, statistics can, for example, support natural attenuation as a potential remedy.

When natural attenuation is being considered as a potential remedy, trend analyses for existing groundwater monitoring data can be used evaluate short-term and long-term effectiveness and to predict remediation time frames. The results of these analyses can support a comparison of natural attenuation to other remedies. Because these analyses use existing data, the evaluation methods are essentially the same as those used for the evaluation of remedy effectiveness for a selected and implemented technology.

After a groundwater remedy has been implemented, statistical analysis of groundwater monitoring results can show the degree of remedy effectiveness. Analyses that can be used to evaluate remedy effectiveness include groundwater plume contouring, an examination of contaminant concentration versus time (temporal trends analysis) and an examination of contaminant concentration versus distance from the source (spatial trend analysis).

Contouring may be used to better understand the spatial pattern. Many software packages perform contouring; these packages often perform poorly with the sparse data sets typical of corrective action sites. As a result, hand contouring is often preferred (Siegel 2008). If a software package is used for contouring, carefully review the results for interpolation and extrapolation errorsTwo common errors in statistical inference are sample error and extrapolation error. An example of when extrapolation errors occur is in curve fitting for prediction outside of the data domain. Hypothesis testing does not account for extrapolation error (Forster 2002).. Remedy effectiveness can be evaluated by 1) plotting the temporal trends on a map and evaluating the spatial pattern in the trends or 2) creating a series of maps and evaluating the change in spatial pattern over time before and during remedy operation (see Section 3.6.7: Does my monitoring network need to be optimized?).

Statistical evaluation of remedy effectiveness may employ a number of methods, and may address a variety of site parameters; some examples are listed below:

Determine whether the change in concentration over time represents a statistically-significant long-term trend (temporal trend analysis, see regression analysis and Mann-Kendall trend test).
Estimate the rate of concentration change over time (the attenuation rate, see Example A.5 and Example A.6). Use the confidence interval for this attenuation rate to evaluate the uncertainty in the estimate (see regression analysis and Theil-Sen test).
Evaluate the areal extent of remedy effectiveness by identifying the wells with higher attenuation rates. The confidence intervals for the attenuation rates can be used to determine whether the observed differences in attenuation rates are statistically significant (see confidence interval bands on regression analysis and Theil-Sen test).
Estimate future contaminant concentrations using the current concentration and the estimated attenuation rate. The confidence interval for the attenuation rate can be used to evaluate the uncertainty in the concentration estimate (see confidence interval bands on regression analysisA statistical tool for evaluating the relationship of one of more independent variables to a single continuous dependent variable (Kleinbaum et al. 2007). and Theil-Sen test).

Comparative statistical tests can also be used to evaluate remedy effectiveness. Comparative tests are most commonly used to evaluate differences in performance parameter values between groups of spatially-associated wells (that is, wells identified as inside rather than outside a treatment area). Appropriate comparative tests, such as t-test and Wilcoxon rank-sum, are discussed in Section 5.11 of this guidance. Comparisons that may be made include contaminant attenuation rates, change in contaminant concentration before and after treatment, or change in concentration of treatment compound before and after treatment.

4.4.2.1 Statistical Methods for Remediation Objectives

When conducting remedy selection or remedy effectiveness activities the following methods are most applicable.

Estimate the rate of concentration change over time (the attenuation rate). Use the confidence interval for this attenuation rate to evaluate the uncertainty in the estimate (see regression analysis and Theil-Sen test).
Evaluate the areal extent of remedy effectiveness by identifying the wells with higher attenuation rates. The confidence intervals for the attenuation rates can be used to determine whether the observed differences in attenuation rates are statistically significant (see confidence interval bands on regression analysis and Theil-Sen test).
Estimate future contaminant concentrations using the current concentration and the estimated attenuation rate. The confidence interval for the attenuation rate can be used as a line of evidence to evaluate the uncertainty in the concentration estimate. However, note that any extrapolation of the attenuation rate or its associated confidence interval beyond the available data range likely includes much greater uncertainty in the projected concentrations from the statistical estimates (see confidence interval bands on regression analysis, Theil-Sen test, and Example A.2).

As discussed in Section 3.6, before implementing a monitoring program, consider the statistical design of the program and the methods that will be used to statistically analyze the measurements. These choices impact the kinds of data that must be collected and the frequency of monitoring. For example, routine, periodic groundwater monitoring lends itself to the use of prediction limitsIntervals constructed to contain the next few sample values or statistics within a known probability (Unified Guidance). with retesting (see Section 5.4) in order to assess whether concentration levels exceed background. To do this appropriately, however, (1) data representing background concentrations must be collected from either dedicated background wells (interwellComparisons between two monitoring wells separated spatially (Unified Guidance). testing) or from (earlier) uncontaminated sampling events at compliance wells (intrawell testing); (2) the background concentrations should be stationaryA distribution whose population characteristics do not change over time or space (Unified Guidance). (stable, nontrending); and (3) there should be enough background observations to give the prediction limit a reasonable chance of identifying a significant change in concentrations (adequate statistical powerStrength of a test to identify an actual release of contaminated groundwater or difference from a criterion (Unified Guidance).).

No matter what evaluation methods are selected, always first graph your data on time series plots. This simple, graphical procedure can be used both to help verify that background concentrations are stable/stationary or to reveal apparent trends over time. Time series plots can also reveal the presence of seasonal or cyclical patterns, which might necessitate special data adjustments (de-seasonalization) or test methods specifically adapted for seasonal data (such as seasonal Mann-Kendall).

In cases where trends are apparent on time series plots, formal trend tests (for example, linear regression, Mann-Kendall, Theil-Sen) can be used to verify whether or not a statistically significant trend exists. Statistical trend methods are also applicable if the purpose of monitoring is to identify the rate at which groundwater contaminants are diminishing, or if attenuation is occurring more quickly in one location over another. Similarly, trend analysis may be used to evaluate the natural attenuation of a contaminant in groundwater. Evaluating and identifying trends in concentrations is an important line of evidence to support monitored natural attenuation (MNA) as part of a groundwater remedy.

A common compliance goal during long-term monitoring is to determine whether groundwater concentrations meet, exceed, or have dropped below a numerical criterion or decision criterion. Under some regulatory programs, an extensive monitoring program may be established when a release occurs. At this point, statistical tests are used to test whether or not the concentrations exceed a specified criterion. Additionally, monitoring may be used to demonstrate that remediation activities have lowered concentrations below a criterion for cleanup. In both settings, a type of confidence interval or limit is an appropriate statistical method. But in selecting a specific method, consider first what the numerical criterion is meant to represent as a statistical quantity, for example a long-term average or an upper percentile. Decision criteria can be established as MCLsmaximum contaminant levels, alternate compliance limits (ACLs), background limits, risk-based concentrations for protection of human health and the environment, or other bases. Most of these criteria are designed to be long-term averages based on chronic exposures; more rarely, a criterion may be based on acute or episodic exposures and thus more akin to a concentration upper percentile. The key statistical principle is to match the type of statistical interval to the type of criterion (for example, using a confidence interval around the mean when comparing against a long-term average-based MCL).

An important corollary to this discussion is the need for multiple, independent statistical measurements with which to decide whether or not groundwater concentrations meet or exceed any criterion with a high degree of statistical confidenceLikelihood that a range of values will contain the population parameter of interest (NIST/SEMATECH 2012).. To be specific, one observation below a criterion does not prove that the maximum or mean concentration of the contaminant population is below the criterion. Neither does one concentration above a criterion indicate that the decision criterion has been violated.

Optimization of a groundwater program can occur at any stage of the life cycle, especially if it makes the program more accurate, efficient, and cost-effective (see, for instance, the resources and options compiled by the Navy’s Facilities Command (NAVFAC) in its Optimization Roadmap (US Navy 2013a). Statistical optimization of monitoring networks is generally practical during long-term monitoring when a larger amount of data has been accumulated or the number of wells is more extensive, or both. In that setting, the optimization objective is to create efficient data collection—in which the right amount of data are collected in order to make accurate decisions in a cost-effective manner.

Statistical methods can be used to judge whether a monitoring program is optimized. At a very high level, this involves (1) estimating the degree of statistical correlationAn estimate of the degree to which two sets of variables vary together, with no distinction between dependent and independent variables (USEPA 2013b). or redundancy between sampling events or sample locations, or both, and (2) estimating the statistical uncertainty associated with trends or spatial maps of concentrations. Similar sample results from neighboring wells or closely-timed events indicates a positive correlation among the observations and possibly statistical redundancy. An optimized sampling and network design tends to show little redundancy while retaining sufficient data to enable accurate and defensible decisions. Importantly, statistical optimization can lead to either more or fewer monitoring wells, sampling events, or monitored chemicals, depending on what best design meets the project goals. Results of any optimization should also be compared to what is known or hypothesized about the site in the (CSMconceptual site model).

Typically four modes of optimization are either directly statistical or substantially affect statistical decision making:

Choice of monitoring parameters (chemicals). The more parameters that must be collected and statistically analyzed, the greater the cost of the monitoring program, the greater the risk of making false positiveIn hypothesis testing, if the null hypothesis (H₀) is true but is rejected in favor of the alternate hypothesis (Hᴀ) which is not true, then a false positive (Type I) error has occurred (Unified Guidance). decision errors, and the greater the site-wide false positive rateThe frequency at which false positive or Type I error occurs. The false positive rate, or α (alpha), is the significance level of a hypothesis test. If a test is at an α = 0.01 level of significance there would be a 1% chance that a Type I error would occur (Unified Guidance). (SWFPR; see Section 3.6). In an optimal program, parameters that are not directly or indirectly related to possible contaminant sources or waste composition, or which are primarily nondetect and therefore statistically non-informative, may not be useful for routine monitoring. Furthermore, as discussed in Section 3.6, if a regulatory program requires or recommends a fixed SWFPRsite-wide false positive rate, the fewer the parameters, the greater the statistical power associated with each of those tests for detecting real changes in groundwater quality.
Choice of data collection (see Section 3.6). For some parameters, it may be feasible to use less expensive field screening techniques or temporary sampling points (for example, Hydropunch) in order to collect a larger number measurements over a broader area, while simultaneously reducing costs relative to traditional sampling and laboratory analysis of dedicated wells. Often, such data may be less precise than traditional groundwater measurements, but the statistical advantage is a much larger sample size, leading to greater overall statistical power and decision accuracy.
Temporal optimization (see Section 3.6 and Section 5.8). This statistical approach aims to optimize sampling frequencies, using one of several methods. One method, known as cost effective sampling (CES), uses linear trend estimates and the statistical uncertainty of those trends to bin wells into less and more frequent sampling categories. Modifications of the CEScost-effective sampling approach have been incorporated into software tools like MAROS and the 3-Tiered Monitoring and Optimization Tool (3TMO). Another method is iterative thinning, based on constructing a trend and then determining how much of the sample data can be removed or ‘thinned,’ yet still allow the original trend to be accurately reconstructed. The greater the percentage of data removed, the greater the degree of redundancy, and the less sampling required for an optimal sampling frequency.
Spatial optimization (see also Section 3.6 and Section 5.14). This approach attempts to optimize the number and placement of wells in a monitoring network. Different approaches seek to measure either (1) statistical redundancy between sampling points to assess whether some of those locations can be dropped from routine monitoring, or (2) statistical uncertainty and how it varies across the site. Areas with high uncertainty and no or few wells are candidates for adding new sampling locations. In general, both of these tasks rely on geostatistical techniques involving a significant degree of complexity. Specialized software tools such as GTS, Summit Tools, and VSP (in addition to those referenced above) have been developed to perform these statistical analyses.

Calculate the monotonic trends of concentrations over time at a single location to identify statistically significant concentration trends (see linear regression, Mann-Kendall trend test, and Example A.2).
Estimate attenuation rates (rates of change), and use of confidence intervals (uncertainty) for the attenuation rate (see linear regression, Theil-Sen trend line).
Compare the estimated attenuation rates in two wells by comparing the slopes. This comparison does not, however, demonstrate the relationship between the wells (see confidence interval bands on linear regression, Theil-Sen trend line, and time series plots).
Calculate a confidence interval for a monotonic trendThe long-term movement in an ordered series, which regarded together with the oscillation and random component, generates observed values that are entirely increasing or decreasing. (EPA 2006c) around the criterion to estimate when compliance can be reached (see confidence interval bands on linear regression and Theil-Sen trend line).
As appropriate, consider the adequacy of sampling (both events and wells) to meet project objectives. Relevant tools are iterative thinning, CES, and spatial analyses.

Statistical tools for comparison of groundwater concentrations to a fixed criterion include confidence intervals or a one-sample hypothesis test. For this comparison to be valid, the sample population must be stable, with no increasing or decreasing trends. If groundwater concentrations are above the criterion and are changing over time, a trend analysis should be conducted. A confidence band around the trend line can be estimated and compared to the criterion to determine when compliance can be reached. The site owner may also evaluate other closure options such as closure with institutional or engineering controls, or should take additional action (remedial or other) to address the remaining contamination.

The choice of confidence interval should be based on the type of fixed criterion to which the groundwater data will be compared. State or federal regulatory programs determine the appropriate statistical parameter for comparison to a criterion. If a mean- or medianThe 50th percentile of an ordered set of samples (Unified Guidance).-based parameter is chosen, fairly straightforward confidence interval testing is implemented. If the maximum or not-to-exceed criterion is the regulatory goal, then the program must identify a specific upper proportion and confidence level that the criterion represents. If nonparametric upper proportion tests must be used for the maximum or not-to-exceed criterion, then it will be very difficult to document compliance (see Chapter 5, Unified Guidance) because of the large number of samples required. Care must be taken that the confidence interval for an upper percentile concentration (such as the upper 95th percentile for a maximum or not-to-exceed criterion) should not be compared to a confidence interval constructed around the arithmetic meanThe sum of a list of numbers, divided by the number of values (Stark 2013). (under a mean or median-based criterion); see Chapter 5, Unified Guidance.

To show compliance with a fixed criterion, groundwater concentrations must not be increasing with time. Check for trends and make corrections as needed to ensure validity of the statistical evaluation. If contaminant concentrations remain above designated fixed criteria, then demonstration of a stable or downward trend, combined with institutional controls, or engineering controls may be sufficient to justify closure for the site.

Compare a data set to a numerical criterion. A criterion may be an MCLmaximum contaminant level, risk-based or fixed background concentration. Comparisons to a criterion are generally one-sample tests based on confidence interval testing against a fixed criterion, well by well and chemical by chemical. Pooled data from multiple wells can be compared to a numerical criterion if the numerical criterion was developed based on a limit consistent with such comparisons.
Evaluate the UCLupper confidence limit against the criterion. Choose a confidence interval consistent with the basis of the criterion. If the criterion represents an average concentration, the UCL of the mean or median concentration of the monitoring wells should not exceed the criterion. If the criterion represents an upper percentile or maximum, no more than a small, specified fraction of the individual concentration measurements should exceed the criterion (see confidence intervals for more information).
In some cases, background is used to demonstrate that a site is suitable for closure. Test to see if contaminant concentrations are not different from background. If a fixed background value has been established, use single-sample confidence interval methods to compare concentrations at closure to background. Use two-sample methods for interwell comparison, comparison of compliance wells to background wells, and comparison of compliance wells to established site background. For more information, see Shewhart-CUSUMcumulative sum control chart control charts (intrawell), tolerance limits, prediction limits, t-test or Wilcoxon rank sum (Mann-Whitney), one-way ANOVA, and Kruskal-Wallis test (interwell).
To show compliance with a fixed criterion using standard confidence intervals, concentrations must not be increasing. If the data are not stable, a trend should be estimated along with a confidence band around the trend. Then compliance can be documented if a point in time occurs at which the confidence band drops below the criterion and remains so. Alternatively, if contaminant concentrations remain above designated fixed criteria, demonstrating a stable or downward trend, combined with institutional controls, engineering controls, or land use controls, may justify closure for the site (see linear regression and Theil-Sen trend line).
Evaluate the areal extent of remedy effectiveness by identifying the wells with higher attenuation rates. Use the confidence intervals for the attenuation rates to determine whether the observed differences in attenuation rates are statistically significant (see linear regression and Theil-Sen trend line).
Estimate future contaminant concentrations using the current concentration and the estimated attenuation rate. Use the confidence interval for the attenuation rate to evaluate the uncertainty in the concentration estimate (see linear regression and Theil-Sen trend line).

4. Statistical Analysis for Project Life Cycle Stages

4.1 Considerations for Statistical Analysis

4.2 Release Detection

4.2.1 Background Conditions

4.2.2 Location or Selection of Background Wells

4.2.3 Monitoring for a Release

4.2.4 Statistical Methods for Release Detection Objectives

4.3 Site Characterization

4.3.1 Physical Site Conditions

4.3.2 Existing Contamination and its Sources

4.3.3 Pathways and Mechanisms of Transport

4.3.4 Statistical Methods for Site Characterization Objectives

4.4 Remediation

4.4.1 Remedy Selection

4.4.2 Remedy Effectiveness

4.5 Monitoring

4.5.1 Monitoring for Concentration Changes

4.5.2 Compliance with Criteria

4.5.3 Optimization of Long-term Monitoring Networks

4.5.4 Statistical Methods for Monitoring Objectives

4.6 Closure

4.6.1 Compliance with Criteria

4.6.2 Trends Toward Compliance Criteria

4.6.3 Statistical Methods for Closure Objectives