How to Easily Calculate (And Interpret) Partial Correlations in R

How to calculate partial correlations in R

This tutorial explains how to calculate (and interpret) partial correlations in R.

What is Partial Correlation?

Correlation is a measure of the strength and direction of a linear relationship between two continuous variables. Typically when we talk about correlation, we are talking about the Pearson Correlation Coefficient, which has a value between -1 and 1 where:

  • -1 indicates a perfectly negative linear correlation between two variables
  • 0 indicates no linear correlation between two variables
  • 1 indicates a perfectly positive linear correlation between two variables

Partial correlation is similar to correlation, except it measures the correlation between two continuous variables while controlling for the effect of one or more other continuous variables.

For example, suppose you are the owner of an ice cream truck and you want to understand whether there is a linear relationship between price and ice cream sales while controlling for other variables like temperature and the number of competitors. You’re only interested in the relationship between price and sales, but since you suspect that temperature and the number of competitors also influence sales, you’d like to understand this relationship while also controlling for these other variables.

The following example illustrates how to find partial correlations in R for a dataset that contains information about ice cream sales.

How to Find Partial Correlations in R

Suppose we have the following dataset that contains information about four variables for 14 days:

  • Price (the price of an ice cream cone in dollars on that given day)
  • Sales (the number of cones sold on that given day)
  • Temperature (the temperature in Fahrenheit on that given day)
  • Competitors (the number of competitor ice cream trucks also out on that given day)
#create data
data <- data.frame(price = c(2, 2, 1, 2, 3, 3, 2, 1, 1, 2, 1, 2, 2, 1),
                   sales = c(30, 34, 40, 30, 25, 22, 29, 31, 31, 21, 39, 27, 28, 38),
                   temp = c(80, 84, 90, 81, 76, 77, 78, 82, 84, 88, 69, 75, 80, 70),
                   comp = c(2, 5, 4, 2, 2, 3, 3, 1, 2, 4, 5, 2, 4, 2))

#view data
data

#   price sales temp comp
#1      2    30   80    2
#2      2    34   84    5
#3      1    40   90    4
#4      2    30   81    2
#5      3    25   76    2
#6      3    22   77    3
#7      2    29   78    3
#8      1    31   82    1
#9      1    31   84    2
#10     2    21   88    4
#11     1    39   69    5
#12     2    27   75    2
#13     2    28   80    4
#14     1    38   70    2

To find partial correlations in R, we can use the pcor.test() function from the ppcor library, which uses the following syntax:

pcor.test(first variable, second variable, control variables)

  • first variable and second variable are the two variables you want to find the partial correlation for
  • control variables is a list of one or more variables you want to control for

The following code illustrates how to find the the partial correlations between price and sales while controlling for temperature and competitors:

#load ppcor library
library(ppcor)

#find partial correlation between price and sales; control for temp and comp
pcor.test(data$price, data$sales, list(data$temp, data$comp))

#    estimate     p.value statistic  n gp  Method
#1 -0.8119031 0.001339438 -4.397905 14  2 pearson

We receive the following output from pcor.test:

  • estimate: the partial correlation coefficient between two variables
  • p.value: the p-value of the correlation test (if p-value is less than significance level, e.g. 0.05, then the correlation is statistically significant
  • statistic: the value of the test statistic for the correlation test
  • n: sample size
  • gn: the number of given variables
  • method: the correlation method used

The partial correlation between price and sales while controlling for temperature and competitors is -0.8119. This is a fairly strong negative linear relationship. This means higher prices are associated with lower sales and lower prices are associated with higher sales, while controlling for temperature and competitors. 

The p-value for this partial correlation is 0.0013, which indicates a statistically significant partial correlation at the 0.05 significance level.

Note that we could also find the partial correlation between price and sales while controlling for just one variable, such as temperature:

#find partial correlation between price and sales; control for temp
pcor.test(data$price, data$sales, data$temp)

#    estimate     p.value statistic  n gp  Method
#1 -0.7874614 0.001395208 -4.237292 14  1 pearson

In addition, we could find the partial correlation for each combination of variables by simply using the pcor() function: 

pcor(data)

#$estimate
#           price      sales       temp      comp
#price  1.0000000 -0.8119031 -0.3305925 0.3209308
#sales -0.8119031  1.0000000 -0.3684723 0.4022136
#temp  -0.3305925 -0.3684723  1.0000000 0.2635285
#comp   0.3209308  0.4022136  0.2635285 1.0000000
#
#$p.value
#            price       sales      temp      comp
#price 0.000000000 0.001339438 0.2939198 0.3090995
#sales 0.001339438 0.000000000 0.2385714 0.1949166
#temp  0.293919752 0.238571408 0.0000000 0.4078932
#comp  0.309099542 0.194916639 0.4078932 0.0000000
#
#$statistic
#          price     sales       temp      comp
#price  0.000000 -4.397905 -1.1077078 1.0715548
#sales -4.397905  0.000000 -1.2534027 1.3892378
#temp  -1.107708 -1.253403  0.0000000 0.8638872
#comp   1.071555  1.389238  0.8638872 0.0000000
#
#$n
#[1] 14
#
#$gp
#[1] 2
#
#$method
#[1] "pearson"

The first matrix in the output shows the partial correlation for each combination of variables while controlling for all other variables in the dataset. For example:

  • The correlation between price and sales while controlling for temperature and competitors is -0.8119.
  • The correlation between price and temperature while controlling for sales and competitors is -0.3305.
  • The correlation between price and number of competitors while controlling for temperature and sales is 0.3209.

The second matrix in the output shows the p-value for each test of partial correlation. For example:

  • The p-value for the correlation between price and sales while controlling for temperature and competitors is 0.0013. This is significant at the 0.05 significance level.
  • The p-value for the correlation between price and sales while controlling for temperature and competitors is 0.2939. This is not significant at the 0.05 significance level.
  • The p-value for the correlation between price and sales while controlling for temperature and competitors is 0.3090. This is not significant at the 0.05 significance level.

To find out more about the ppcor library, check out the full documentation here.

Leave a Reply

Your email address will not be published. Required fields are marked *