How to Calculate Correlation By Group in R


You can use the following basic syntax to calculate the correlation between two variables by group in R:

library(dplyr)

df %>%
  group_by(group_var) %>%
  summarize(cor=cor(var1, var2))

This particular syntax calculates the correlation between var1 and var2, grouped by group_var.

The following example shows how to use this syntax in practice.

Example: Calculate Correlation By Group in R

Suppose we have the following data frame that contains information about basketball players on various teams:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(18, 22, 19, 14, 14, 11, 20, 28),
                 assists=c(2, 7, 9, 3, 12, 10, 14, 21))

#view data frame
df

  team points assists
1    A     18       2
2    A     22       7
3    A     19       9
4    A     14       3
5    B     14      12
6    B     11      10
7    B     20      14
8    B     28      21

We can use the following syntax from the dplyr package to calculate the correlation between points and assists, grouped by team:

library(dplyr)

df %>%
  group_by(team) %>%
  summarize(cor=cor(points, assists))

# A tibble: 2 x 2
  team    cor
   
1 A     0.603
2 B     0.982

From the output we can see:

  • The correlation coefficient between points and assists for team A is .603.
  • The correlation coefficient between points and assists for team B is .982.

Since both correlation coefficients are positive, this tells us that the relationship between points and assists for both teams is positive.

Related: What is Considered to Be a “Strong” Correlation?

Additional Resources

The following tutorials explain how to perform other common operations in R:

How to Count Unique Values by Group in R
How to Calculate the Sum by Group in R
How to Calculate the Mean by Group in R
How to Calculate Summary Statistics by Group in R

Leave a Reply

Your email address will not be published.