How to Perform a Shapiro-Wilk Test in Python


The Shapiro-Wilk test is a test of normality. It is used to determine whether or not a sample comes from a normal distribution.

To perform a Shapiro-Wilk test in Python we can use the scipy.stats.shapiro() function, which takes on the following syntax:

scipy.stats.shapiro(x)

where:

  • x: An array of sample data.

This function returns a test statistic and a corresponding p-value. If the p-value is below a certain significance level, then we have sufficient evidence to say that the sample data does not come from a normal distribution.

This tutorial shows a couple examples of how to use this function in practice.

Example 1: Shapiro-Wilk Test on Normally Distributed Data

Suppose we have the following sample data:

from numpy.random import seed
from numpy.random import randn

#set seed (e.g. make this example reproducible)
seed(0)

#generate dataset of 100 random values that follow a standard normal distribution
data = randn(100)

The following code shows how to perform a Shapiro-Wilk test on this sample of 100 data values to determine if it came from a normal distribution:

from scipy.stats import shapiro

#perform Shapiro-Wilk test
shapiro(data)

ShapiroResult(statistic=0.9926937818527222, pvalue=0.8689165711402893)

From the output we can see that the test statistic is 0.9927 and the corresponding p-value is 0.8689. Since the p-value is not less than .05, we fail to reject the null hypothesis. We do not have sufficient evidence to say that the sample data does not come from a normal distribution.

This result shouldn’t be surprising since we generated the sample data using the randn() function, which generates random values that follow a standard normal distribution.

Example 2: Shapiro-Wilk Test on Non-Normally Distributed Data

Now suppose we have the following sample data:

from numpy.random import seed
from numpy.random import poisson

#set seed (e.g. make this example reproducible)
seed(0)

#generate dataset of 100 values that follow a Poisson distribution with mean=5
data = poisson(5, 100)

The following code shows how to perform a Shapiro-Wilk test on this sample of 100 data values to determine if it came from a normal distribution:

from scipy.stats import shapiro

#perform Shapiro-Wilk test
shapiro(data)

ShapiroResult(statistic=0.9581913948059082, pvalue=0.002994443289935589)

From the output we can see that the test statistic is 0.9582 and the corresponding p-value is 0.00299. Since the p-value is less than .05, we reject the null hypothesis. We have sufficient evidence to say that the sample data does not come from a normal distribution.

This result also shouldn’t be surprising since we generated the sample data using the poisson() function, which generates random values that follow a Poisson distribution.

Additional Resources

Shapiro-Wilk Test Calculator
How to Perform a Shapiro-Wilk Test in R
How to Perform an Anderson-Darling Test in Python
How to Perform a Kolmogorov-Smirnov Test in Python

Leave a Reply

Your email address will not be published. Required fields are marked *