How to Calculate Partial Correlation in R


In statistics, we often use the Pearson correlation coefficient to measure the linear relationship between two variables. However, sometimes we’re interested in understanding the relationship between two variables while controlling for a third variable.

For example, suppose we want to measure the association between the number of hours a student studies and the final exam score they receive, while controlling for the student’s current grade in the class. In this case, we could use a partial correlation to measure the relationship between hours studied and final exam score.

This tutorial explains how to calculate partial correlation in R.

Example: Partial Correlation in R

Suppose we have the following data frame that displays the current grade, total hours studied, and final exam score for 10 students:

#create data frame
df <- data.frame(currentGrade = c(82, 88, 75, 74, 93, 97, 83, 90, 90, 80),
                 hours = c(4, 3, 6, 5, 4, 5, 8, 7, 4, 6),
                 examScore = c(88, 85, 76, 70, 92, 94, 89, 85, 90, 93))

#view data frame
df

   currentGrade hours examScore
1            82     4        88
2            88     3        85
3            75     6        76
4            74     5        70
5            93     4        92
6            97     5        94
7            83     8        89
8            90     7        85
9            90     4        90
10           80     6        93

To calculate the partial correlation between each pairwise combination of variables in the dataframe, we can use the pcor() function from the ppcor library:

#calculate partial correlations
pcor(df)

$estimate
             currentGrade      hours examScore
currentGrade    1.0000000 -0.3112341 0.7355673
hours          -0.3112341  1.0000000 0.1906258
examScore       0.7355673  0.1906258 1.0000000

$p.value
             currentGrade     hours  examScore
currentGrade   0.00000000 0.4149353 0.02389896
hours          0.41493532 0.0000000 0.62322848
examScore      0.02389896 0.6232285 0.00000000

$statistic
             currentGrade      hours examScore
currentGrade    0.0000000 -0.8664833 2.8727185
hours          -0.8664833  0.0000000 0.5137696
examScore       2.8727185  0.5137696 0.0000000

$n
[1] 10

$gp
[1] 1

$method
[1] "pearson"

Here is how to interpret the output:

Partial correlation between hours studied and final exam score:

The partial correlation between hours studied and final exam score is .191, which is a small positive correlation. As hours studied increases, exam score tends to increase as well, assuming current grade is held constant.

The p-value for this partial correlation is .623, which is not statistically significant at α = 0.05.

Partial correlation between current grade and final exam score:

The partial correlation between current grade and final exam score is .736, which is a strong positive correlation. As current grade increases, exam score tends to increase as well, assuming hours studied is held constant.

The p-value for this partial correlation is .024, which is statistically significant at α = 0.05.

Partial correlation between current grade and hours studied:

The partial correlation between current grade and hours studied and final exam score is -.311, which is a mild negative correlation. As current grade increases, final exam score tends to decreases, assuming final exam score is held constant.

The p-value for this partial correlation is 0.415, which is not statistically significant at α = 0.05.

The output also tells us that the method used to calculate the partial correlation was “pearson.” Within the pcor() function, we could also specify “kendall” or “pearson” as alternative methods to calculate the correlations.

You can find the complete documentation for the ppcor library here.

Leave a Reply

Your email address will not be published. Required fields are marked *