# How to Calculate Correlation By Group in Pandas

You can use the following basic syntax to calculate the correlation between two variables by group in pandas:

```df.groupby('group_var')[['values1','values2']].corr().unstack().iloc[:,1]
```

The following example shows how to use this syntax in practice.

### Example: Calculate Correlation By Group in Pandas

Suppose we have the following pandas DataFrame:

```import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'points': [18, 22, 19, 14, 14, 11, 20, 28],
'assists': [2, 7, 9, 3, 12, 10, 14, 21]})

#view DataFrame
print(df)```

We can use the following code to calculate the correlation between points and assists, grouped by team:

```#calculate correlation between points and assists, grouped by team
df.groupby('team')[['points','assists']].corr().unstack().iloc[:,1]

team
A    0.603053
B    0.981798
Name: (points, assists), dtype: float64```

From the output we can see:

• The correlation coefficient between points and assists for team A is .603053.
• The correlation coefficient between points and assists for team B is .981798.

Since both correlation coefficients are positive, this tells us that the relationship between points and assists for both teams is positive.

That is, players who tend to score more points also tend to record more assists.

Note that we could shorten the syntax by not using the unstack and iloc functions, but the results are uglier:

```df.groupby('team')[['points','assists']].corr()

points	  assists
team
A	points	1.000000  0.603053
assists	0.603053  1.000000
B	points	1.000000  0.981798
assists	0.981798  1.000000
```

This syntax produces a correlation matrix for both teams, which provides us with excessive information.