## D.15 R FOR STATISTICS

Approximate Cost: Free

Source: http://www.r-project.org

Operating System Needs: Operates on Windows, Mac OS, and most versions of UNIX.

Input Structure: Scripts can be written in R to read and analyze data from a wide variety of data sources including, but not limited to text/binary files, spreadsheets, and databases.

Overview

According to the R FAQ (Hornik 2013), "R is a system for statistical computation and graphics consisting of a programming language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files." "R is an integrated suite of software facilities for data manipulation, calculation, and graphical display" according to the “An Introduction to R” document (Venables et al. 2013).

Several existing add-on packages extend the functionality of R. A partial list can be found at http://cran.r-project.org/doc/FAQ/R-FAQ.html●Add_002don-packages-from-CRAN.

Ease of Use and Data Import

The most common data structures in R are vectors and data frames. Higher order data structures such as lists and data frames are also available for advanced analysis. The R environment may challenge a new user; however, an interactive user interface and comprehensive help documentation are provided. In addition, active development is underway to generate graphical user interfaces that provide a method to access commonly used functions.

Types of Distributions

R can be used for calculating properties of probability distributions as well as to check whether a given data set fits a standard distribution. A number of distributions and distributional tests are supported in R, including: beta, binomial, Cauchy, chi-squared, exponential, F, gammaA gamma distribution or data set. A parametric unimodal distribution model commonly applied to groundwater data where the data set is left skewed and tied to zero. Very similar to Weibull and lognormal distributions; differences are in their tail behavior, and the gamma density has the second longest tail where its coefficient of variation is less than 1 (Unified Guidance; Gilbert 1987; Silva and Lisboa 2007)., geometric, hypergeometric, lognormalA dataset that is not normally distributed (symmetric bell-shaped curve) but that can be transformed using a natural logarithm so that the data set can be evaluated using a normal-theory test (Unified Guidance)., logistic, negative binomial, normal, Poisson, Student’s T, uniform, and Weibull.

Visualization

R has a mature graphics library and can produce presentation quality graphics for most of the commonly used plots, such as stem and leaf, box plots, scatter plotsGraphical representation of multiple observations from a single point used to illustrate the relationship between two or more variables. An example would be concentrations of one chemical on the x-axis and a second chemical on the y-axis. They are a typical exploratory data analysis tool to identify linear versus nonlinear relationships between variables (Unified Guidance).,histograms, and contours.

Primary Uses for Groundwater Data Analysis

R is commonly used to perform the following tasks:

Benefits

• provides a flexible, interactive, and powerful environment for data analysis and visualization
• free
• built-in support for a variety of simple to the complex statistical analyses
• scripts for performing complex analysis
• easily produces presentation-quality graphics and automated reports
• active and knowledgeable online community for support issues.
• detailed online documentation

Limitations and Data Requirements

• The program provides the functions and libraries to read and process data from a variety of sources including, but not limited to ASCII Files, binary Files, spreadsheets, and databases.
• As long as the data format and structure is known, data can be imported into the R environment.
• The environment challenging to the first-time user, and presents a steep initial learning curve.

References

Faraway, J. 2002. Practical Regression and ANOVA Using R. http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf.

Hornik, K. 2013. The R FAQ. http://CRAN.R-project.org/doc/FAQ/R-FAQ.html.

R Development Core Team. 2008. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. http://www.r-project.org.

Venables W.N., D.M. Smith, and the R Core Team. 2013. An Introduction to R. Notes on R: A Programming Environment for Data Analysis and Graphics. Version 3.0.1.

Publication Date: December 2013

Permission is granted to refer to or quote from this publication with the customary acknowledgment of the source (see suggested citation and disclaimer).

This web site is owned by ITRC.

1250 H Street, NW • Suite 850 • Washington, DC 20005

(202) 266-4933 • Email: [email protected]

ITRC is sponsored by the Environmental Council of the States.