The **Mahalanobis distance **is the distance between two points in a multivariate space. It’s often used to find outliers in statistical analyses that involve several variables.

This tutorial explains how to calculate the Mahalanobis distance in SPSS.

**Example: Mahalanobis Distance in SPSS**

Suppose we have the following dataset that displays the exam score of 20 students along with the number of hours they spent studying, the number of prep exams they took, and their current grade in the course:

We can use the following steps to calculate the Mahalanobis distance for each observation in the dataset to determine if there are any multivariate outliers.

**Step 1: Select the linear regression option.**

Click the **Analyze **tab, then **Regression**, then **Linear**:

**Step 2: Select the Mahalanobis option.**

Drag the response variable *score *into the box labelled Dependent. Drag the other three predictor variables into the box labelled Independent(s). Then click the **Save **button. In the new window that pops up, make sure the box next to **Mahalanobis **is checked. Then click **Continue**. Then click **OK**.

Once you click **OK**, the Mahalanobis distance for each observation in the dataset will appear in a new column titled **MAH_1**:

We can see that some of the distances are much larger than others. To determine if any of the distances are statistically significant, we need to calculate their p-values.

**Step 3: Calculate the p-values of each Mahalanobis distance.**

Click the **Transform **tab, then **Compute Variable**.

In the **Target Variable **box, choose a new name for the variable you’re creating. We chose “pvalue.” In the **Numeric Expression **box, type the following:

**1 – CDF.CHISQ(MAH_1, 3)**

Then click **OK**.

This will produce a p-value that corresponds to the Chi-Square value with 3 degrees of freedom. We use **3 **degrees of freedom because there are 3 predictor variables in our regression model.

**Step 4: Interpret the p-values.**

Once you click **OK**, the p-value for each Mahalanobis distance will be displayed in a new column:

By default, SPSS only displays the p-values to two decimal places. You can increase the number of decimal places by clicking **Variable ****View **at the bottom of SPSS and increasing the number in the **Decimals **column:

Once you return to the **Data View**, you can see each p-value shown to five decimal places. Any p-value that is **less than .001 **is considered to be an outlier.

We can see that the first observation is the only outlier in the dataset because it has a p-value less than .001:

**How to Handle Outliers**

If an outlier is present in your data, you have a couple options:

**1. Make sure the outlier is not the result of a data entry error.**

Sometimes an individual simply enters the wrong data value when recording data. If an outlier is present, first verify that the data value was entered correctly and that it wasn’t an error.

**2. Remove the outlier.**

If the value is a true outlier, you may choose to remove it if it will have a significant impact on your overall analysis. Just make sure to mention in your final report or analysis that you removed an outlier.

Hi Zach,

As you mentioned any p-value that is less than .001 is considered to be an outlier.

If it is at a value of 0.05 , will it be then any p-value less than 0.05 will be considered as an outliner?

Thanks,

Regards,

Anjum

Thanks Zach, for this unusually clear exposition. I have a related question:

I’ve read that when comparing two samples like yours, but of widely diverging sizes or just small, a correction involving D2, size and number of predictors should be subtracted from the previously obtained between-samples-D2, generating a new D2. Should significance be calculated on the first D2 or on the corrected one?

Very useful information,

Thanks a lot!

Mahalanobis distance should be clearly define in short and understandable form and it should be explained what it tells us.

With regards

THANK U