Pandas: How to Group and Aggregate by Multiple Columns


Often you may want to group and aggregate by multiple columns of a pandas DataFrame. Fortunately this is easy to do using the pandas .groupby() and .agg() functions.

This tutorial explains several examples of how to use these functions in practice.

Example 1: Group by Two Columns and Find Average

Suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'B', 'B', 'B', 'M', 'M', 'M'],
                   'position': ['G', 'G', 'F', 'G', 'F', 'F', 'C', 'C'],
                   'assists': [5, 7, 7, 8, 5, 7, 6, 9],
                   'rebounds': [11, 8, 10, 6, 6, 9, 6, 10]})

#view DataFrame
print(df)

  team position  assists  rebounds
0    A        G        5        11
1    B        G        7         8
2    B        F        7        10
3    B        G        8         6
4    B        F        5         6
5    M        F        7         9
6    M        C        6         6
7    M        C        9        10

The following code shows how to group by columns ‘team’ and ‘position’ and find the mean assists:

df.groupby(['team', 'position']).agg({'assists': ['mean']}).reset_index()


        team	position  assists
                          mean
0	A	G	  5.0
1	B	F	  6.0
2	B	G	  7.5
3	M	C	  7.5
4	M	F	  7.0

The output tells us:

  • The mean assists for players in position G on team A is 5.0.
  • The mean assists for players in position F on team B is 6.0.
  • The mean assists for players in position G on team B is 7.5.

And so on.

We can also use the following code to rename the columns in the resulting DataFrame:

#group by team and position and find mean assists
new = df.groupby(['team', 'position']).agg({'assists': ['mean']}).reset_index()

#rename columns 
new.columns = ['team', 'pos', 'mean_assists']

#view DataFrame
print(new)

	team	pos	mean_assists
0	A	G	5.0
1	B	F	6.0
2	B	G	7.5
3	M	C	7.5
4	M	F	7.0

Example 2: Group by Two Columns and Find Multiple Stats

Assume we use the same pandas DataFrame as the previous example:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'B', 'B', 'B', 'M', 'M', 'M'],
                   'position': ['G', 'G', 'F', 'G', 'F', 'F', 'C', 'C'],
                   'assists': [5, 7, 7, 8, 5, 7, 6, 9],
                   'rebounds': [11, 8, 10, 6, 6, 9, 6, 10]})

The following code shows how to find the median and max number of rebounds, grouped on columns ‘team’ and ‘position’:

df.groupby(['team', 'position']).agg({'rebounds': ['median', 'max']}).reset_index()


        team	position	rebounds
                         median	max
0	A	G	 11	11
1	B	F	 8	10
2	B	G	 7	8
3	M	C	 8	10
4	M	F	 9	9

The output tells us:

  • The median rebounds assists for players in position G on team A is 11.
  • The max rebounds for players in position G on team A is 11.
  • The median rebounds for players in position F on team B is 8.
  • The max rebounds for players in position F on team B is 10.

And so on.

Additional Resources

How to Filter a Pandas DataFrame on Multiple Conditions
How to Count Missing Values in a Pandas DataFrame
How to Stack Multiple Pandas DataFrames

Leave a Reply

Your email address will not be published. Required fields are marked *