Outliers


In small datasets, outliers may reside at two standard deviations or more from the population mean. However, once a single outlier is removed from the population, the other data points previously thought to also be outliers, may now fall within the normal range, as the population mean and standard deviation can be narrowed due to the removal of an outlier. This procedure selects the single most extreme value, in terms of deviation from the population mean and then re-computes the population statistics to identify if any remaining extreme values may also be outliers.

Example screen shot:

Syntax

proc outlier data=DATAFILE;
var VARIABLENAME;
out = OUTPUTRESULT;

Parameters Used

There are three parameters for the Outlier procedure:

DATAFILE - the datafile to be analyzed
VARIABLENAME - the numeric variable to be analyzed
OUTPUTRESULT - name of the file to store the results of the analysis

Example Script

*;
* outlier2.ezs;
* determine any outliers in the census data;
* variable tested is number of households in the 2000 census;
libname test '{%libout}';
libname census '{%libin}';
proc outlier data=census.censuspart1;
var v0062000;
out = test.ol2;
run;

Example Output

Analysts interested in Proc Outlier may also be interested in Proc Means, Proc Summary, Proc Benford and Proc Univariate.
 

There is also a web analytics version which can be run directly from the Internet using Excel or other data source (in tab separated value format). View Web Analytics details. All software is provided at no cost.

Web Page last updated on 02-09-2007
© EZ-R Stats, LLC 2005-2007

© EZ_R Stats

Visit EZ_R Stats on the web at:

www.ezrstats.com