You can use the normalize argument within the pandas crosstab() function to create a crosstab that displays percentage values instead of counts:
pd.crosstab(df.col1, df.col2, normalize='index')
The normalize argument accepts three different arguments:
- all: Display percentage relative to all values.
- index: Display percentage as total of row values.
- columns: Display percentage as total of column values.
The following examples show how to use each of these methods in practice with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'], 'position':['G', 'G', 'F', 'G', 'F', 'F', 'F', 'G', 'G', 'F', 'F'], 'points': [22, 25, 24, 39, 34, 20, 18, 17, 20, 19, 22]}) #view DataFrame print(df) team position points 0 A G 22 1 A G 25 2 A F 24 3 B G 39 4 B F 34 5 B F 20 6 B F 18 7 C G 17 8 C G 20 9 C F 19 10 C F 22
Here is what the default crosstab would look like for the count of players by team and position:
#create crosstab that displays count by team and position
pd.crosstab(df.team, df.position)
position F G
team
A 1 2
B 3 1
C 2 2
Example 1: Create Crosstab with Percentages Relative to All Values
We can use the crosstab() function with the argument normalize=all to create a crosstab that displays percentages of each value relative to the total count of all values:
#create crosstab that displays counts as percentage relative to total count pd.crosstab(df.team, df.position, normalize='all') position F G team A 0.090909 0.181818 B 0.272727 0.090909 C 0.181818 0.181818
Here is how to interpret the output:
- Players on team A in position F account for 9.09% of total players.
- Players on team A in position G account for 18.18% of total players.
And so on.
Example 2: Create Crosstab with Percentages Relative to Row Totals
We can use the crosstab() function with the argument normalize=index to create a crosstab that displays percentages of each value relative to the row total:
#create crosstab that displays counts as percentage relative to row totals pd.crosstab(df.team, df.position, normalize='index') position F G team A 0.333333 0.666667 B 0.750000 0.250000 C 0.500000 0.500000
Here is how to interpret the output:
- Players in position F account for 33.33% of total players on team A.
- Players in position F account for 75% of total players on team B.
- Players in position F account for 50% of total players on team C.
And so on.
Example 3: Create Crosstab with Percentages Relative to Column Totals
We can use the crosstab() function with the argument normalize=columns to create a crosstab that displays percentages of each value relative to the column total:
#create crosstab that displays counts as percentage relative to column totals pd.crosstab(df.team, df.position, normalize='columns') position F G team A 0.166667 0.4 B 0.500000 0.2 C 0.333333 0.4
Here is how to interpret the output:
- Players on team A account for 16.67% of total players with a position of F.
- Players on team B account for 50% of total players with a position of F.
- Players on team C account for 33.33% of total players with a position of F.
And so on.
Note: You can find the complete documentation for the pandas crosstab() function here.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
Pandas: How to Use GroupBy and Value Counts
Pandas: How to Use GroupBy with Bin Counts
Pandas: How to Count Values in Column with Condition