How to Convert Pandas GroupBy Output to DataFrame


This tutorial explains how to convert the output of a pandas GroupBy into a pandas DataFrame.

Example: Convert Pandas GroupBy Output to DataFrame

Suppose we have the following pandas DataFrame that shows the points scored by basketball players on various teams:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'position': ['G', 'G', 'F', 'C', 'G', 'F', 'F', 'F'],
                   'points': [5, 7, 7, 10, 12, 22, 15, 10]})

#view DataFrame
print(df)

  team position  points
0    A        G       5
1    A        G       7
2    A        F       7
3    A        C      10
4    B        G      12
5    B        F      22
6    B        F      15
7    B        F      10

We can use the following syntax to count the number of players, grouped by team and position:

#count number of players, grouped by team and position
group = df.groupby(['team', 'position']).size()

#view output
print(group)

team  position
A     C           1
      F           1
      G           2
B     F           3
      G           1
dtype: int64

From the output, we can see the total count of players, grouped by team and position.

However, suppose we want our output to display the team name in each row like this:

  team position  count
0    A        C      1
1    A        F      1
2    A        G      2
3    B        F      3
4    B        G      1

To achieve this output, we can simply use reset_index() when performing the GroupBy:

#count number of players, grouped by team and position
df_out = df.groupby(['team', 'position']).size().reset_index(name='count')

#view output
print(df_out)

  team position  count
0    A        C      1
1    A        F      1
2    A        G      2
3    B        F      3
4    B        G      1

The output now appears in the format that we wanted.

Note that the name argument within reset_index() specifies the name for the new column produced by GroupBy.

We can also confirm that the result is indeed a pandas DataFrame:

#display object type of df_out
type(df_out)

pandas.core.frame.DataFrame

Note: You can find the complete documentation for the GroupBy operation in pandas here.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

Pandas: How to Calculate Cumulative Sum by Group
Pandas: How to Count Unique Values by Group
Pandas: How to Calculate Correlation By Group

Leave a Reply

Your email address will not be published.