How to Perform Multivariate Normality Tests in Python


When we’d like to test whether or not a single variable is normally distributed, we can create a Q-Q plot to visualize the distribution or we can perform a formal statistical test like an Anderson Darling Test or a Jarque-Bera Test.

However, when we’d like to test whether or not several variables are normally distributed as a group we must perform a multivariate normality test.

This tutorial explains how to perform the Henze-Zirkler multivariate normality test for a given dataset in Python.

Related: If we’d like to identify outliers in a multivariate setting, we can use the Mahalanobis distance.

Example: Henze-Zirkler Multivariate Normality Test in Python

The Henze-Zirkler Multivariate Normality Test determines whether or not a group of variables follows a multivariate normal distribution. The null and alternative hypotheses for the test are as follows:

H0 (null): The variables follow a multivariate normal distribution.

Ha (alternative): The variables do not follow a multivariate normal distribution.

To perform this test in Python we can use the multivariate_normality() function from the pingouin library.

First, we need to install pingouin:

pip install pingouin

Next, we can import the multivariate_normality() function and use it to perform a Multivariate Test for Normality for a given dataset:

#import necessary packages
from pingouin import multivariate_normality
import pandas as pd
import numpy as np

#create a dataset with three variables x1, x2, and x3
df = pd.DataFrame({'x1':np.random.normal(size=50),
                   'x2': np.random.normal(size=50),
                   'x3': np.random.normal(size=50)})

#perform the Henze-Zirkler Multivariate Normality Test
multivariate_normality(df, alpha=.05)

HZResults(hz=0.5956866563391165, pval=0.6461804077893423, normal=True)

The results of the test are as follows:

  • H-Z Test Statistic: 0.59569
  • p-value: 0.64618

Since the p-value of the test is not less than our specified alpha value of .05, we fail to reject the null hypothesis. The dataset can be assumed to follow a multivariate normal distribution.

Related: Learn how the Henze-Zirkler test is used in real-life medical applications in this research paper.

Leave a Reply

Your email address will not be published.