The **Mahalanobis distance **is the distance between two points in a multivariate space.

It is often used to find outliers in statistical analyses that involve several variables.

This tutorial explains how to calculate the Mahalanobis distance in R.

**Example: Mahalanobis Distance in R**

Use the following steps to calculate the Mahalanobis distance for every observation in a dataset in R.

**Step 1: Create the dataset.**

First, we’ll create a dataset that displays the exam score of 20 students along with the number of hours they spent studying, the number of prep exams they took, and their current grade in the course:

#create data df = data.frame(score = c(91, 93, 72, 87, 86, 73, 68, 87, 78, 99, 95, 76, 84, 96, 76, 80, 83, 84, 73, 74), hours = c(16, 6, 3, 1, 2, 3, 2, 5, 2, 5, 2, 3, 4, 3, 3, 3, 4, 3, 4, 4), prep = c(3, 4, 0, 3, 4, 0, 1, 2, 1, 2, 3, 3, 3, 2, 2, 2, 3, 3, 2, 2), grade = c(70, 88, 80, 83, 88, 84, 78, 94, 90, 93, 89, 82, 95, 94, 81, 93, 93, 90, 89, 89)) #view first six rows of data head(df) score hours prep grade 1 91 16 3 70 2 93 6 4 88 3 72 3 0 80 4 87 1 3 83 5 86 2 4 88 6 73 3 0 84

**Step 2: Calculate the Mahalanobis distance for each observation.**

Next, we’ll use the built-in mahalanobis() function in R to calculate the Mahalanobis distance for each observation, which uses the following syntax:

**mahalanobis(x, center, cov)**

where:

**x:**matrix of data**center:**mean vector of the distribution**cov:**covariance matrix of the distribution

The following code shows how to implement this function for our dataset:

#calculate Mahalanobis distance for each observation mahalanobis(df, colMeans(df), cov(df)) [1] 16.5019630 2.6392864 4.8507973 5.2012612 3.8287341 4.0905633 [7] 4.2836303 2.4198736 1.6519576 5.6578253 3.9658770 2.9350178 [13] 2.8102109 4.3682945 1.5610165 1.4595069 2.0245748 0.7502536 [19] 2.7351292 2.2642268

**Step 3: Calculate the p-value for each Mahalanobis distance.**

What’s the difference between using:

(1) mahalanobis(df, colMeans(df), cov(df)); and

(2) mahalanobis(df, center=T, cov(df))

?

I am asking because I’m getting different values for the tow options

This article was very useful for my work. Thanks a lot.

it’s possible to have a pdf file who explain the distance mahalanobis