You can use the following basic syntax to calculate the correlation between two variables by group in pandas:

df.groupby('group_var')[['values1','values2']].corr().unstack().iloc[:,1]

The following example shows how to use this syntax in practice.

**Example: Calculate Correlation By Group in Pandas**

Suppose we have the following pandas DataFrame:

**import pandas as pd
#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'points': [18, 22, 19, 14, 14, 11, 20, 28],
'assists': [2, 7, 9, 3, 12, 10, 14, 21]})
#view DataFrame
print(df)**

We can use the following code to calculate the correlation between **points** and **assists**, grouped by **team**:

#calculate correlation between points and assists, grouped by team df.groupby('team')[['points','assists']].corr().unstack().iloc[:,1] team A 0.603053 B 0.981798 Name: (points, assists), dtype: float64

From the output we can see:

- The correlation coefficient between points and assists for team A is
**.603053**. - The correlation coefficient between points and assists for team B is
**.981798**.

Since both correlation coefficients are positive, this tells us that the relationship between points and assists for both teams is positive.

That is, players who tend to score more points also tend to record more assists.

**Related:** What is Considered to Be a “Strong” Correlation?

Note that we could shorten the syntax by not using the **unstack** and **iloc** functions, but the results are uglier:

df.groupby('team')[['points','assists']].corr() points assists team A points 1.000000 0.603053 assists 0.603053 1.000000 B points 1.000000 0.981798 assists 0.981798 1.000000

This syntax produces a correlation matrix for both teams, which provides us with excessive information.

**Additional Resources**

The following tutorials explain how to perform other common operations in pandas:

How to Perform a GroupBy Sum in Pandas

How to Use Groupby and Plot in Pandas

How to Count Unique Values Using GroupBy in Pandas