This tutorial explains how to calculate (and interpret) partial correlations in R.

**What is Partial Correlation?**

**Correlation **is a measure of the strength and direction of a linear relationship between two continuous variables. Typically when we talk about correlation, we are talking about the Pearson Correlation Coefficient, which has a value between -1 and 1 where:

- -1 indicates a perfectly negative linear correlation between two variables
- 0 indicates no linear correlation between two variables
- 1 indicates a perfectly positive linear correlation between two variables

**Partial correlation** is similar to correlation, except it measures the correlation between two continuous variables *while controlling for the effect of one or more other continuous variables*.

For example, suppose you are the owner of an ice cream truck and you want to understand whether there is a linear relationship between price and ice cream sales while controlling for other variables like temperature and the number of competitors. You’re only interested in the relationship between price and sales, but since you suspect that temperature and the number of competitors also influence sales, you’d like to understand this relationship while also controlling for these other variables.

The following example illustrates how to find partial correlations in R for a dataset that contains information about ice cream sales.

**How to Find Partial Correlations in R**

Suppose we have the following dataset that contains information about four variables for 14 days:

**Price**(the price of an ice cream cone in dollars on that given day)**Sales**(the number of cones sold on that given day)**Temperature**(the temperature in Fahrenheit on that given day)**Competitors**(the number of competitor ice cream trucks also out on that given day)

#create data data <- data.frame(price = c(2, 2, 1, 2, 3, 3, 2, 1, 1, 2, 1, 2, 2, 1), sales = c(30, 34, 40, 30, 25, 22, 29, 31, 31, 21, 39, 27, 28, 38), temp = c(80, 84, 90, 81, 76, 77, 78, 82, 84, 88, 69, 75, 80, 70), comp = c(2, 5, 4, 2, 2, 3, 3, 1, 2, 4, 5, 2, 4, 2)) #view data data # price sales temp comp #1 2 30 80 2 #2 2 34 84 5 #3 1 40 90 4 #4 2 30 81 2 #5 3 25 76 2 #6 3 22 77 3 #7 2 29 78 3 #8 1 31 82 1 #9 1 31 84 2 #10 2 21 88 4 #11 1 39 69 5 #12 2 27 75 2 #13 2 28 80 4 #14 1 38 70 2

To find partial correlations in R, we can use the **pcor.test()** function from the **ppcor **library, which uses the following syntax:

pcor.test(first variable, second variable, control variables)

*first variable*and*second variable*are the two variables you want to find the partial correlation for*control variables*is a list of one or more variables you want to control for

The following code illustrates how to find the the partial correlations between price and sales while controlling for temperature and competitors:

#loadppcorlibrary library(ppcor) #find partial correlation betweenpriceandsales; control fortempandcomppcor.test(data$price, data$sales, list(data$temp, data$comp)) # estimate p.value statistic n gp Method #1 -0.8119031 0.001339438 -4.397905 14 2 pearson

We receive the following output from **pcor.test**:

**estimate:**the partial correlation coefficient between two variables**p.value:**the p-value of the correlation test (if p-value is less than significance level, e.g. 0.05, then the correlation is statistically significant**statistic:**the value of the test statistic for the correlation test**n:**sample size**gn:**the number of given variables**method:**the correlation method used

The partial correlation between price and sales while controlling for temperature and competitors is **-0.8119**. This is a fairly strong negative linear relationship. This means higher prices are associated with lower sales and lower prices are associated with higher sales, while controlling for temperature and competitors.

The p-value for this partial correlation is **0.0013**, which indicates a statistically significant partial correlation at the 0.05 significance level.

Note that we could also find the partial correlation between price and sales while controlling for just one variable, such as temperature:

#find partial correlation betweenpriceandsales; control fortemppcor.test(data$price, data$sales, data$temp) # estimate p.value statistic n gp Method #1 -0.7874614 0.001395208 -4.237292 14 1 pearson

In addition, we could find the partial correlation for each combination of variables by simply using the **pcor()** function:

pcor(data) #$estimate # price sales temp comp #price 1.0000000 -0.8119031 -0.3305925 0.3209308 #sales -0.8119031 1.0000000 -0.3684723 0.4022136 #temp -0.3305925 -0.3684723 1.0000000 0.2635285 #comp 0.3209308 0.4022136 0.2635285 1.0000000 # #$p.value # price sales temp comp #price 0.000000000 0.001339438 0.2939198 0.3090995 #sales 0.001339438 0.000000000 0.2385714 0.1949166 #temp 0.293919752 0.238571408 0.0000000 0.4078932 #comp 0.309099542 0.194916639 0.4078932 0.0000000 # #$statistic # price sales temp comp #price 0.000000 -4.397905 -1.1077078 1.0715548 #sales -4.397905 0.000000 -1.2534027 1.3892378 #temp -1.107708 -1.253403 0.0000000 0.8638872 #comp 1.071555 1.389238 0.8638872 0.0000000 # #$n #[1] 14 # #$gp #[1] 2 # #$method #[1] "pearson"

The first matrix in the output shows the partial correlation for each combination of variables while controlling for all other variables in the dataset. For example:

- The correlation between price and sales while controlling for temperature and competitors is
**-0.8119**. - The correlation between price and temperature while controlling for sales and competitors is
**-0.3305**. - The correlation between price and number of competitors while controlling for temperature and sales is
**0.3209**.

The second matrix in the output shows the p-value for each test of partial correlation. For example:

- The p-value for the correlation between price and sales while controlling for temperature and competitors is
**0.0013**. This is significant at the 0.05 significance level. - The p-value for the correlation between price and sales while controlling for temperature and competitors is
**0.2939**. This is not significant at the 0.05 significance level. - The p-value for the correlation between price and sales while controlling for temperature and competitors is
**0.3090**. This is not significant at the 0.05 significance level.

To find out more about the **ppcor **library, check out the full documentation here.