5.9 Time Series Forecasting
Time series forecasting allows univariate or multivariate forecasting of future values of an observed time series or multiple time series over a specified forecasting horizon (time frame). For example, what might the anticipated concentration of a chemical be in a given compliance well in two years? Forecasts are based on a model fitted to present and past observations. Either an automated model or a user specified model may be used. Time series forecasting follows on the discussion of sample autocorrelationCorrelation of values of a single variable data set over successive time intervals (Unified Guidance). The degree of statistical correlation either (1) between observations when considered as a series collected over time from a fixed sampling point (temporal autocorrelation) or (2) within a collection of sampling points when considered as a function of distance between distinct locations (spatial autocorrelation). function (Section 5.8.3); review Section 5.8.3 if you are not familiar with time series forecasting and autocorrelation functions.
5.9.1 Automated models (such as Holt, Holt-Winters Forecasting)
For automated models such as Holt or Holt-Winters forecasting, a program automatically analyzes the data, selects forecasting techniques, and generates a forecast. Using an automated approach, exponential smoothing procedures rely on simple, recursive updating equations (geometrically weighted sums of past observations, with more emphasis placed on recent observations and less emphasis on more distant observations). These procedures can also account for trends and seasonal variations (see Chatfield 1994). The smoothing parameter is generally subjectively chosen to be between 0.1 and 0.3 (its exact value is typically not critical), but can be numerically estimated as well.
5.9.2 User-specified models (such as ARIMA).
Autoregressive integrated moving average (ARIMAAutoregressive integrated moving average (ARIMA) is a time series model consisting of autoregressive parameters (explaining the time series observation with past values) and moving average parameters (random shocks with an error structure that is usually Gaussian). The integrated portion of the model refers to the order of differencing (subtracting one observation from the previous one) in order to simulate stationarity in nonstationary data.) procedures rely on the analyst's subjective judgment or knowledge to select an appropriate model from a broad class of available models for a given data series. Interpretation of correlogramsA plot of the autocorrelation coefficients versus the time lags. This plot is also known as an autocorrelation plot. produced by the autocorrelation function (ACF), as well as the partial autocorrelation function (plotted against the lag in time), suggests which model might be appropriate. If the groundwater data series contains a trend, differencing the data (by calculating new data points based on the calculated lag) usually produces a stationaryA distribution whose population characteristics do not change over time or space (Unified Guidance). series. The residuals of the fitted model must be analyzed to verify the appropriateness of the model (such as, by the Portmanteau lack-of-fit test or by the Durbin-Watson statistic). Forecasts for a given lead time can then be readily computed by the difference equations (Box and Jenkins 1976).
- Time series consists of at least eight observations, recorded at equally spaced (or nearly equally spaced) intervals in time. If seasonal variation is present, the time series should encompass at least two full cycles (for example, quarterly data requires at least eight observations, data collected monthly requires at least 24 observations). A minimum of three full cycles is recommended if the seasonal variation is not clearly defined.
- Concentrations (or other variables) change in a set way with time (cyclical variation, steadily increasing/decreasing rate).
- Prediction intervals typically assume that the forecasts are unbiased, and that the forecast errors are normally distributed.
- If you suspect outliers, examine the data (with a probability plot, Dixon's test, or Rosner’s test) and consider removing verified outliersValues unusually discrepant from the rest of a series of observations (Unified Guidance)..
- Select a level of confidence for the forecasts, such as 50% and 95%; this level of confidence may be determined by federal or state regulatory requirements or guidance.
- Forecast future values of an observed time series in conjunction with the calculation of a prediction interval at a given confidence levelDegree of confidence associated with a statistical estimate or test, denoted as (1 – alpha) (Unified Guidance)., because typically, with increasing lead time, the uncertainties of point forecasts increase rapidly.
- The forecast should be limited to one quarter of the length of the observed time series.
- Regularly update the time series, as soon as new observations become available, to decrease the forecast error for a given lag (forecast horizon).
- If data appear to be stationary, consider fitting a simple model, such as exponential smoothing, or an autoregressive moving average (ARIMAAutoregressive integrated moving average (ARIMA) is a time series model consisting of autoregressive parameters (explaining the time series observation with past values) and moving average parameters (random shocks with an error structure that is usually Gaussian). The integrated portion of the model refers to the order of differencing (subtracting one observation from the previous one) in order to simulate stationarity in nonstationary data.) model with few parameters.
- If data appear to be nonstationary, consider either exponential smoothing with a trend term, or data transformation such as log or differencing before fitting a general integrated ARIMA model.
- If data exhibit seasonal fluctuations, consider either seasonal exponential smoothing, or differencing the data with a seasonal lag before fitting a general, additive or multiplicative ARIMAautoregressive integrated moving average(or seasonal ARIMA) model.
- If the varianceThe square of the standard deviation (EPA 1989); a measure of how far numbers are separated in a data set. A small variance indicates that numbers in the dataset are clustered close to the mean. of the data changes with time, consider the use of a model that can consider more than one distribution for the data and preserve that information, such as autoregressive conditional heteroscedasticityThe inequality of the variances of error terms in a data set (Engle 2001). (ARCH) and generalized autoregressive conditional heteroscedasticity (GARCH).
- Consider the use of external predictor or explanatory variables, if additional information is available and relevant.
- The automatic (Holt-Winters) approach to time series forecasting is typically easier to implement and should be adequate for most groundwater data series where univariate forecasting is sufficient.
- The ARIMAautoregressive integrated moving average (Box-Jenkins) approach requires a more thorough understanding of the underlying stochastic process and order of autoregressive and moving average terms, but is useful for more sophisticated forecasting analyses, especially if correlations with other time series are also being considered.
If you suspect that the observed time series is oscillatory, and revolves around a constant meanThe arithmetic average of a sample set that estimates the middle of a statistical distribution (Unified Guidance)., methods from the theory of linear prediction can also be used (see Yaglom 1962).
Publication Date: December 2013