Pandas: How to Use GroupBy on a MultiIndex


You can use the following basic syntax to use GroupBy on a pandas DataFrame with a multiindex:

#calculate sum by level 0 and 1 of multiindex
df.groupby(level=[0,1]).sum()

#calculate count by level 0 and 1 of multiindex
df.groupby(level=[0,1]).count()

#calculate max value by level 0 and 1 of multiindex
df.groupby(level=[0,1]).max()

...

Each of these examples calculate some metric grouped by two levels of a multiindex pandas DataFrame.

The following example shows how to use this syntax in practice.

Example: Use GroupBy on MultiIndex in pandas

Suppose we have the following pandas DataFrame with a multiindex:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'position': ['G', 'G', 'F', 'F', 'G', 'G', 'F', 'F'],
                   'points': [6, 8, 9, 11, 13, 8, 8, 15]})

#define multiindex
df.set_index(['team', 'position'], inplace=True)

#view DataFrame
print(df)

               points
team position        
A    G              6
     G              8
     F              9
     F             11
B    G             13
     G              8
     F              8
     F             15

We can use the following syntax to calculate the sum of the points values grouped by both levels of the multiindex:

#calculate sum of points grouped by both levels of the multiindex:
df.groupby(level=[0,1]).sum()

		 points
team	position	
A	F	 20
        G        14
B	F	 23
        G        21

We can use similar syntax to calculate the max of the points values grouped by both levels of the multiindex:

#calculate max of points grouped by both levels of the multiindex:
df.groupby(level=[0,1]).max()

		 points
team	position	
A	F	 11
        G         8
B	F	 15
        G        13

We can use similar syntax to calculate any value we’d like grouped by several levels of a multiindex.

Note: You can find the complete documentation for the GroupBy operation in pandas here.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

Pandas: How to Calculate Cumulative Sum by Group
Pandas: How to Count Unique Values by Group
Pandas: How to Calculate Correlation By Group

Leave a Reply

Your email address will not be published.