You can use the following syntax to calculate the percentage of a total within groups in pandas:
df['values_var'] / df.groupby('group_var')['values_var'].transform('sum')
The following example shows how to use this syntax in practice.
Example: Calculate Percentage of Total Within Group
Suppose we have the following pandas DataFrame that shows the points scored by basketball players on various teams:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
'points': [12, 29, 34, 14, 10, 11, 7, 36, 34, 22]})
#view DataFrame
print(df)
team points
0 A 12
1 A 29
2 A 34
3 A 14
4 A 10
5 B 11
6 B 7
7 B 36
8 B 34
9 B 22
We can use the following syntax to create a new column in the DataFrame that shows the percentage of total points scored, grouped by team:
#calculate percentage of total points scored grouped by team
df['team_percent'] = df['points'] / df.groupby('team')['points'].transform('sum')
#view updated DataFrame
print(df)
team points team_percent
0 A 12 0.121212
1 A 29 0.292929
2 A 34 0.343434
3 A 14 0.141414
4 A 10 0.101010
5 B 11 0.100000
6 B 7 0.063636
7 B 36 0.327273
8 B 34 0.309091
9 B 22 0.200000
The team_percent column shows the percentage of total points scored by that player within their team.
For example, players on team A scored a total of 99 points.
Thus, the player in the first row of the DataFrame who scored 12 points scored a total of 12/99 = 12.12% of the total points for team A.
Similarly, the player in the second row of the DataFrame who scored 29 points scored a total of 29/99 = 29.29% of the total points for team A.
And so on.
Note: You can find the complete documentation for the GroupBy function here.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
Pandas: How to Calculate Cumulative Sum by Group
Pandas: How to Count Unique Values by Group
Pandas: How to Calculate Mode by Group
Pandas: How to Calculate Correlation By Group