The Shapiro-Wilk test is used to determine whether or not a dataset follows a normal distribution.
The following step-by-step example shows how to perform a Shapiro-Wilk test for a dataset in SAS.
Step 1: Create the Data
First, we’ll create a dataset that contains 15 observations:
/*create dataset*/ data my_data; input x; datalines; 3 3 4 6 7 8 8 9 12 14 15 15 17 20 21 ; run; /*view dataset*/ proc print data=my_data;
Step 2: Perform the Shapiro-Wilk Test
Next, we’ll use proc univariate with the normal command to perform a Shapiro-Wilk test for normality:
/*perform Shapiro-Wilk test*/ proc univariate data=my_data normal; run;
The output provides us with a ton of information, but the only table we need to look at is the one titled Tests for Normality.
This table provides the test statistics and p-values for several normality tests including:
- The Shapiro-Wilk Test
- The Kolmogorov-Smirnov Test
- The Cramer-von Mises Test
- The Anderson-Darling Test
From this table we can see that the p-value for the Shapiro-Wilk test is .3452.
Recall that a Shapiro-Wilk test uses the following null and alternative hypotheses:
- H0: The data is normally distributed.
- HA: The data is not normally distributed.
Since the p-value (.3452) is not less than .05, we fail to reject the null hypothesis.
This means we do not have sufficient evidence to say that the dataset is not normally distributed.
In other words, it’s safe to assume that the dataset is normally distributed.
The following tutorials explain how to perform other common statistical tests in SAS: