Pandas: How to Rename Columns in Groupby Function


You can use the following basic syntax to rename columns in a groupby() function in pandas:

df.groupby('group_col').agg(sum_col1=('col1', 'sum'),
                            mean_col2=('col2', 'mean'),
                            max_col3=('col3', 'max'))

This particular example calculates three aggregated columns and names them sum_col1, mean_col2, and max_col3.

The following example shows how to use this syntax in practice.

Example: Rename Columns in Groupby Function in Pandas

Suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'points': [30, 22, 19, 14, 14, 11, 20, 28],
                   'assists': [5, 6, 6, 5, 8, 7, 7, 9],
                   'rebounds': [4, 13, 15, 10, 7, 7, 5, 11]})

#view DataFrame
print(df)

  team  points  assists  rebounds
0    A      30        5         4
1    A      22        6        13
2    A      19        6        15
3    A      14        5        10
4    B      14        8         7
5    B      11        7         7
6    B      20        7         5
7    B      28        9        11

We can use the following syntax to group the rows by the team column, then calculate three aggregated columns while providing specific names to the aggregated columns:

#calculate several aggregated columns by group and rename aggregated columns
df.groupby('team').agg(sum_points=('points', 'sum'),
                       mean_assists=('assists', 'mean'),
                       max_rebounds=('rebounds', 'max'))

	sum_points	mean_assists	max_rebounds
team			
A	        85	        5.50	          15
B	        73	        7.75	          11

Notice that the three aggregated columns have the custom names that we provided in the agg() function.

Also note that we could use NumPy functions to calculate the sum, mean, and max values within the agg() function if we’d like.

import numpy as np

#calculate several aggregated columns by group and rename aggregated columns
df.groupby('team').agg(sum_points=('points', np.sum),
                       mean_assists=('assists', np.mean),
                       max_rebounds=('rebounds', np.max))

	sum_points	mean_assists	max_rebounds
team			
A	        85	        5.50	          15
B	        73	        7.75	          11

These results match the ones from the previous example.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to List All Column Names in Pandas
How to Sort Columns by Name in Pandas
How to Drop Duplicate Columns in Pandas

Leave a Reply

Your email address will not be published.