How to Calculate Mahalanobis Distance in R


The Mahalanobis distance is the distance between two points in a multivariate space. It’s often used to find outliers in statistical analyses that involve several variables.

This tutorial explains how to calculate the Mahalanobis distance in R.

Example: Mahalanobis Distance in R

Use the following steps to calculate the Mahalanobis distance for every observation in a dataset in R.

Step 1: Create the dataset.

First, we’ll create a dataset that displays the exam score of 20 students along with the number of hours they spent studying, the number of prep exams they took, and their current grade in the course:

#create data
df = data.frame(score = c(91, 93, 72, 87, 86, 73, 68, 87, 78, 99, 95, 76, 84, 96, 76, 80, 83, 84, 73, 74),
        hours = c(16, 6, 3, 1, 2, 3, 2, 5, 2, 5, 2, 3, 4, 3, 3, 3, 4, 3, 4, 4),
        prep = c(3, 4, 0, 3, 4, 0, 1, 2, 1, 2, 3, 3, 3, 2, 2, 2, 3, 3, 2, 2),
        grade = c(70, 88, 80, 83, 88, 84, 78, 94, 90, 93, 89, 82, 95, 94, 81, 93, 93, 90, 89, 89))

#view first six rows of data
head(df)

  score hours prep grade
1    91    16    3    70
2    93     6    4    88
3    72     3    0    80
4    87     1    3    83
5    86     2    4    88
6    73     3    0    84

Step 2: Calculate the Mahalanobis distance for each observation.

Next, we’ll use the built-in mahalanobis() function in R to calculate the Mahalanobis distance for each observation, which uses the following syntax:

mahalanobis(x, center, cov)

where:

  • x: matrix of data
  • center: mean vector of the distribution
  • cov: covariance matrix of the distribution

The following code shows how to implement this function for our dataset:

#calculate Mahalanobis distance for each observation
mahalanobis(df, colMeans(df), cov(df))

 [1] 16.5019630  2.6392864  4.8507973  5.2012612  3.8287341  4.0905633
 [7]  4.2836303  2.4198736  1.6519576  5.6578253  3.9658770  2.9350178
[13]  2.8102109  4.3682945  1.5610165  1.4595069  2.0245748  0.7502536
[19]  2.7351292  2.2642268

Step 3: Calculate the p-value for each Mahalanobis distance.

Leave a Reply

Your email address will not be published. Required fields are marked *