In small datasets, outliers may reside at two standard deviations or more from the population mean. However, once a single outlier is removed from the population, the other data points previously thought to also be outliers, may now fall within the normal range, as the population mean and standard deviation can be narrowed due to the removal of an outlier. This procedure selects the single most extreme value, in terms of
deviation from the population mean and then re-computes the population statistics to identify if any remaining extreme values may also be outliers.
Example screen shot:
Syntax
proc outlier data=DATAFILE;
var VARIABLENAME;
out = OUTPUTRESULT;
Parameters Used
There are three parameters for the Outlier procedure:
DATAFILE - the datafile to be analyzed
VARIABLENAME - the numeric variable to be analyzed
OUTPUTRESULT - name of the file to store the results of the analysis
Example Script
*;
* outlier2.ezs;
* determine any outliers in the census data;
* variable tested is number of households in the 2000 census;
libname test '{%libout}';
libname census '{%libin}';
proc outlier data=census.censuspart1;
var v0062000;
out = test.ol2;
run;
Analysts interested in Proc Outlier
may also be interested in Proc Means, Proc Summary,
Proc Benford and Proc Univariate.
There is also a web analytics version which can be run directly from the Internet using Excel or other data source (in tab separated value format). View Web Analytics details. All software is provided at no cost.

