5.6 Distributional Tests

Normal data distribution, bell curves, and histograms

The mathematical model of the normal distribution produces a perfectly smooth, symmetrical, bell-shaped curve. The mean and standard deviation of the data determine the shape of the bell. The mean locates the bell peak on the horizontal axis, and the standard deviation determines the width of the bell. A large standard deviation means that the bell will be broad and flat. A small standard deviation means that the bell will be narrow and skinny (the concentrations in the data set do not deviate much from the mean).

A histogram presents a rough depiction of the data distribution that can be matched with the mathematical model of the normal curve. The histogram, which orders the values, counts the number (frequency) of values within a fixed rangeThe difference between the largest value and smallest value in a dataset (NIST/SEMATECH 2012). of values (a bin) and plots the frequency of values within each bin on the y-axis at the bin’s central value on the x-axis.

Figure 5-13. Histogram example.

Are the data normal?

The first task when using a parametric test is to test the underlying assumption of normality. If the data do not produce a nicely shaped bell, for example, if the bell is lopsided or has several peaks, then the underlying mathematical model for the test will not match the data and may produce erroneous results. Other complications might cause the data to appear non-normal, such as outliers, the presence of nondetectsLaboratory analytical result known only to be below the method detection limit (MDL), or reporting limit (RL); see "censored data" (Unified Guidance)., or changes over space or time (nonstationarity). Testing for normality should be conducted in conjunction with tests for outliersValues unusually discrepant from the rest of a series of observations (Unified Guidance). and nonstationarity.

Nondetects are left-censored dataValues that are reported as nondetect. Values known only to be below a threshold value such as the method detection limit or analytical reporting limit (Helsel 2005)., meaning that, below a certain reporting limit the concentrations are not known. Most tests for normality depend on the values at the ends or tails of the ordered data. Too many nondetects in the data set (the Unified Guidance recommends having no more than 10-15% nondetects), can cause problems with the normality tests because the concentrations at the lower tail of the sample distribution are unknown, yet a value is needed for standard normality test to be run. Use caution in substituting values for nondetects, even at low percentages of nondetects. Apply nonparametricStatistical test that does not depend on knowledge of the distribution of the sampled population (Unified Guidance). methods if there is doubt regarding the usability of the data due to the presence of nondetects. See Section 5.7: Managing Nondetects in Statistical Analyses for more information on nondetects.

Outliers are anomalous data found at the tails of data distributions, so their presence may cause problems in testing for normality. If outliers are suspected and a test for normality fails, try removing the suspected outliers and rerunning the test. See Section 5.10: Identification of Outliers for more information on outliers.

Nonstationarity can be an issue with data collected over space or time. The change of concentrations over time or the inconsistency of data over a large area may introduce data that are not in the same distribution. Distribution tests might fail when grouping data sets together even if the original data sets are independently normally distributed. Trend tests or analysis of varianceThe square of the standard deviation (EPA 1989); a measure of how far numbers are separated in a data set. A small variance indicates that numbers in the dataset are clustered close to the mean. (ANOVA) tests should be used if non-stationarityStationarity exists when the population being sampled has a constant mean and variance across time and space (Unified Guidance). is suspected. See Section 3.4.6, Section 5.5, and Section 5.8 for more information on evaluating stationarity.

Many specific methods can test for normality of data distributions, including the goodness-of-fit tests, which compare a chosen distribution with the data set of interest. The following are commonly applied methods:

Coefficient of Skewness and Variation
Kolmogorov-Smirnov test
Graphical assessment of normality (probability plot), probability plots
Shapiro-Wilk test
Shapiro-Francia normality test

Application and Relevant Study Questions

Calculation of the coefficients of skewness and variation can aid in evaluating a data set for normality.

Assumptions: None

Requirements and Tips

These methods are not appropriate for data that have been changed by a log transformation.
Use of a minimum of 8 to 10 values is recommended, a larger data set may be required if data are skewed or contain nondetects
See Section 5.7 for information on handling nondetects.

Strengths and Weaknesses

These methods are useful for a quick and easy evaluation of data that will reveal a possible non-normal distribution.
These methods do not confirm normality, but can provide evidence against normality. Therefore, these methods should be used in conjunction with other tests.

Further Information

Chapter 10.4, Unified Guidance includes discussion of the coefficient of variation and coefficient of skewness.

Application and Relevant Study Questions

Goodness of fit tests are used to test the assumption of normality prior to applying other statistical tests.
Study Question 9: Is the sampling frequency appropriate (temporal optimization)?

Assumptions

The K-S test only applies for continuous distributions, but these distributions are usually expected in environmental systems.

Requirements and Tips:

If the K-S test fails (p-value is less than the selected significance level), try transforming the data and re-testing for normality.
Use of a minimum of 8 to 10 values is recommended, a larger data set may be required if data are skewed or contain nondetects.

Strengths and Weaknesses

The K-S test is a robust test that only considers the relative distribution of the data, therefore log-transformation of the data do not negatively affect this test.
The test is more sensitive around the center of the curve than near the tails.
The K-S test is not as powerful as the Shapiro-Wilk test.

Further Information

Chapter 10, Unified Guidance provides information regarding fitting of distributions to data sets

Application and Relevant Study Questions

Used to test for normality.
If the SW value exceeds the critical value, the data set is probably normally distributed.
If the SW is less than the critical value, the data set is not normally distributed. In this case, you may use a data transformation and re-test the transformed data for normality.

Assumptions: None

Requirements and Tips

Use caution when applying this method to data sets with a large number of nondetects; a larger number of detects will give a better result. For best results, chose a coefficient (α) = 0.10 for very small data sets (n < 10), α = 0.05 for moderately sized data sets (10≤n<20), and α = 0.01 for large data sets (n≥20). This approach is not useful for very large data sets (n>50).

Strengths and Weaknesses

Because it involves null hypothesisOne of two mutually exclusive statements about the population from which a sample is taken, and is the initial and favored statement, H₀, in hypothesis testing (Unified Guidance). significance testing, if you reject null hypothesis you may conclude that the population is not normally distributed. Rejecting the null hypothesis means that population is not normally distributed, but it does not indicate whether the reason for non-normality is because of a flat-tailed distribution, a skewed distribution, or something else.
If the null hypothesis is not rejected, you may only conclude that the test failed to show that the population is not normally distributed. In other words, the test can substantiate that the population is not normally distributed, but it cannot prove that the data set is normally distributed.
The tests are influenced by powerSee "statistical power.". If you have a small sample (n is the number of values), then the test may not have enough power to detect normality in the population. If you have a very large sample, then the test will detect even a trivial deviation from normality.

Further Information

Chapter 10.5.1, Unified Guidance includes further information and an example for the Shapiro Wilk test.

Applications and Relative Study Questions

The Shapiro-Francia method is used to test for normality.

Assumptions: None

Requirements and Tips

Use caution when applying this method to data sets with a large number of nondetects; a larger number of detected values will give a better result.

Strengths and Weaknesses

Because it involves null hypothesis significance testing, if you reject null hypothesis you may conclude that the population is not normally distributed. Rejecting the null hypothesis means that population is not normally distributed, but it does not indicate whether the reason for non-normality is because of a flat-tailed distribution, a skewed distribution, or something else.
If the null hypothesis is not rejected, you may only conclude that the test failed to show that the population is not normally distributed. In other words, the test can substantiate that the population is not normally distributed, but it cannot prove that the data set is normally distributed.
The tests are influenced by power. If you have a small sample (n is the number of values), then the test may not have enough power to detect normality in the population. If you have a very large sample, then the test will detect even a trivial deviation from normality.

Further Information

Chapter 10.5.2, Unified Guidance includes information about the Shapiro-Francia test.

5.6 Distributional Tests

5.6.1 Coefficients of Skewness and Variation

5.6.2 Kolmogorov-Smirnov Test

5.6.3 Shapiro-Wilk Test

5.6.4 Shapiro-Francia Normality Test