How to Perform White’s Test in Python (Step-by-Step)


White’s test is used to determine if heteroscedasticity is present in a regression model.

Heteroscedasticity refers to the unequal scatter of residuals at different levels of a response variable, which violates the assumption that the residuals are equally scattered at each level of the response variable.

The following step-by-step example shows how to perform White’s test in Python to determine whether or not heteroscedasticity is a problem in a given regression model.

Step 1: Load Data

In this example we will fit a multiple linear regression model using the mtcars dataset.

The following code shows how to load this dataset into a pandas DataFrame:

from sklearn.linear_model import LinearRegression
from statsmodels.stats.diagnostic import het_white
import statsmodels.api as sm
import pandas as pd

#define URL where dataset is located
url = "https://raw.githubusercontent.com/Statology/Python-Guides/main/mtcars.csv"

#read in data
data = pd.read_csv(url)

#view summary of data
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 12 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   model   32 non-null     object 
 1   mpg     32 non-null     float64
 2   cyl     32 non-null     int64  
 3   disp    32 non-null     float64
 4   hp      32 non-null     int64  
 5   drat    32 non-null     float64
 6   wt      32 non-null     float64
 7   qsec    32 non-null     float64
 8   vs      32 non-null     int64  
 9   am      32 non-null     int64  
 10  gear    32 non-null     int64  
 11  carb    32 non-null     int64  
dtypes: float64(5), int64(6), object(1)

Step 2: Fit Regression Model

Next, we will fit a regression model using mpg as the response variable and disp  and hp as the two predictor variables:

#define response variable
y = data['mpg']

#define predictor variables
x = data[['disp', 'hp']]

#add constant to predictor variables
x = sm.add_constant(x)

#fit regression model
model = sm.OLS(y, x).fit()

Step 3: Perform White’s Test

Next, we will use the het_white() function from the statsmodels package to perform White’s test to determine if heteroscedasticity is present in the regression model:

#perform White's test
white_test = het_white(model.resid,  model.model.exog)

#define labels to use for output of White's test
labels = ['Test Statistic', 'Test Statistic p-value', 'F-Statistic', 'F-Test p-value']

#print results of White's test
print(dict(zip(labels, white_test)))

{'Test Statistic': 7.076620330416624, 'Test Statistic p-value': 0.21500404394263936,
 'F-Statistic': 1.4764621093131864, 'F-Test p-value': 0.23147065943879694}

Here is how to interpret the output:

  • The test statistic is X2 = 7.0766.
  • The corresponding p-value is 0.215.

White’s test uses the following null and alternative hypotheses:

  • Null (H0): Homoscedasticity is present (residuals are equally scattered)
  • Alternative (HA): Heteroscedasticity is present (residuals are not equally scattered)

Since the p-value is not less than 0.05, we fail to reject the null hypothesis.

This means we do not have sufficient evidence to say that heteroscedasticity is present in the regression model.

What To Do Next

If you fail to reject the null hypothesis of White’s test then heteroscedasticity is not present and you can proceed to interpret the output of the original regression.

However, if you reject the null hypothesis, this means heteroscedasticity is present. In this case, the standard errors that are shown in the output table of the regression may be unreliable.

There are two common ways to fix this issue:

1. Transform the response variable.

You can try performing a transformation on the response variable, such as taking the log, square root, or cube root of the response variable. This often causes heteroscedasticity to go away.

2. Use weighted regression.

Weighted regression assigns a weight to each data point based on the variance of its fitted value. Essentially, this gives small weights to data points that have higher variances, which shrinks their squared residuals. When the proper weights are used, this can eliminate the problem of heteroscedasticity.

Additional Resources

The following tutorials provide additional information about linear regression in Python:

A Complete Guide to Linear Regression in Python
How to Create a Residual Plot in Python
How to Calculate VIF in Python

Leave a Reply

Your email address will not be published.