How to Perform a GroupBy Sum in Pandas (With Examples)


You can use the following basic syntax to find the sum of values by group in pandas:

df.groupby(['group1','group2'])['sum_col'].sum().reset_index()

The following examples show how to use this syntax in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'position': ['G', 'G', 'F', 'C', 'G', 'F', 'F', 'C'],
                   'points': [25, 17, 14, 9, 12, 9, 6, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
df

	team	position points	rebounds
0	A	G	 25	11
1	A	G	 17	8
2	A	F	 14	10
3	A	C	 9	6
4	B	G	 12	6
5	B	F	 9	5
6	B	F	 6	9
7	B	C	 4	12

Example 1: Group by One Column, Sum One Column

The following code shows how to group by one column and sum the values in one column:

#group by team and sum the points
df.groupby(['team'])['points'].sum().reset_index()

	team	points
0	A	65
1	B	31

From the output we can see that:

  • The players on team A scored a sum of 65 points.
  • The players on team B scored a sum of 31 points.

Example 2: Group by Multiple Columns, Sum Multiple Columns

The following code shows how to group by multiple columns and sum multiple columns:

#group by team and position, sum points and rebounds
df.groupby(['team', 'position'])['points', 'rebounds'].sum().reset_index()

        team	position points	rebounds
0	A	C	 9	6
1	A	F	 14	10
2	A	G	 42	19
3	B	C	 4	12
4	B	F	 15	14
5	B	G	 12	6

From the output we can see that:

  • The players on team A in the ‘C’ position scored a sum of 9 points and 6 rebounds.
  • The players on team A in the ‘F’ position scored a sum of 14 points and 10 rebounds.
  • The players on team A in the ‘G’ position scored a sum of 42 points and 19 rebounds.

And so on.

Note that the reset_index() function prevents the grouping columns from becoming part of the index.

For example, here’s what the output looks like if we don’t use it:

#group by team and position, sum points and rebounds
df.groupby(['team', 'position'])['points', 'rebounds'].sum()

                 points	rebounds
team	position		
A	C	 9	6
F	14	 10
G	42	 19
B	C	 4	12
F	15	 14
G	12	 6

Depending on how you’d like the results to appear, you may or may not choose to use the reset_index() function.

Additional Resources

The following tutorials explain how to perform other common grouping operations in pandas:

How to Count Observations by Group in Pandas
How to Find the Max Value by Group in Pandas
How to Calculate Quantiles by Group in Pandas

Leave a Reply

Your email address will not be published.