How to Calculate Correlation By Group in Pandas


You can use the following basic syntax to calculate the correlation between two variables by group in pandas:

df.groupby('group_var')[['values1','values2']].corr().unstack().iloc[:,1]

The following example shows how to use this syntax in practice.

Example: Calculate Correlation By Group in Pandas

Suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'points': [18, 22, 19, 14, 14, 11, 20, 28],
                   'assists': [2, 7, 9, 3, 12, 10, 14, 21]})

#view DataFrame
print(df)

We can use the following code to calculate the correlation between points and assists, grouped by team:

#calculate correlation between points and assists, grouped by team
df.groupby('team')[['points','assists']].corr().unstack().iloc[:,1]

team
A    0.603053
B    0.981798
Name: (points, assists), dtype: float64

From the output we can see:

  • The correlation coefficient between points and assists for team A is .603053.
  • The correlation coefficient between points and assists for team B is .981798.

Since both correlation coefficients are positive, this tells us that the relationship between points and assists for both teams is positive.

That is, players who tend to score more points also tend to record more assists.

Related: What is Considered to Be a “Strong” Correlation?

Note that we could shorten the syntax by not using the unstack and iloc functions, but the results are uglier:

df.groupby('team')[['points','assists']].corr()

		points	  assists
team			
A	points	1.000000  0.603053
        assists	0.603053  1.000000
B	points	1.000000  0.981798
        assists	0.981798  1.000000

This syntax produces a correlation matrix for both teams, which provides us with excessive information.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Perform a GroupBy Sum in Pandas
How to Use Groupby and Plot in Pandas
How to Count Unique Values Using GroupBy in Pandas

Leave a Reply

Your email address will not be published.