Pandas: How to Create Crosstab with Percentages


You can use the normalize argument within the pandas crosstab() function to create a crosstab that displays percentage values instead of counts:

pd.crosstab(df.col1, df.col2, normalize='index')

The normalize argument accepts three different arguments:

  • all: Display percentage relative to all values.
  • index: Display percentage as total of row values.
  • columns: Display percentage as total of column values.

The following examples show how to use each of these methods in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
                   'position':['G', 'G', 'F', 'G', 'F', 'F', 'F', 'G', 'G', 'F', 'F'],
                   'points': [22, 25, 24, 39, 34, 20, 18, 17, 20, 19, 22]})

#view DataFrame
print(df)

   team position  points
0     A        G      22
1     A        G      25
2     A        F      24
3     B        G      39
4     B        F      34
5     B        F      20
6     B        F      18
7     C        G      17
8     C        G      20
9     C        F      19
10    C        F      22

Here is what the default crosstab would look like for the count of players by team and position:

#create crosstab that displays count by team and position
pd.crosstab(df.team, df.position)

position  F	G
team		
A	  1	2
B	  3	1
C	  2	2

Example 1: Create Crosstab with Percentages Relative to All Values

We can use the crosstab() function with the argument normalize=all to create a crosstab that displays percentages of each value relative to the total count of all values:

#create crosstab that displays counts as percentage relative to total count
pd.crosstab(df.team, df.position, normalize='all')

position	F	       G
team		
A	0.090909	0.181818
B	0.272727	0.090909
C	0.181818	0.181818

Here is how to interpret the output:

  • Players on team A in position F account for 9.09% of total players.
  • Players on team A in position G account for 18.18% of total players.

And so on.

Example 2: Create Crosstab with Percentages Relative to Row Totals

We can use the crosstab() function with the argument normalize=index to create a crosstab that displays percentages of each value relative to the row total:

#create crosstab that displays counts as percentage relative to row totals
pd.crosstab(df.team, df.position, normalize='index')

position	F	       G
team		
A	0.333333	0.666667
B	0.750000	0.250000
C	0.500000	0.500000

Here is how to interpret the output:

  • Players in position F account for 33.33% of total players on team A.
  • Players in position F account for 75% of total players on team B.
  • Players in position F account for 50% of total players on team C.

And so on.

Example 3: Create Crosstab with Percentages Relative to Column Totals

We can use the crosstab() function with the argument normalize=columns to create a crosstab that displays percentages of each value relative to the column total:

#create crosstab that displays counts as percentage relative to column totals
pd.crosstab(df.team, df.position, normalize='columns')

position	F	  G
team		
A	0.166667	0.4
B	0.500000	0.2
C	0.333333	0.4

Here is how to interpret the output:

  • Players on team A account for 16.67% of total players with a position of F.
  • Players on team B account for 50% of total players with a position of F.
  • Players on team C account for 33.33% of total players with a position of F.

And so on.

Note: You can find the complete documentation for the pandas crosstab() function here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Use GroupBy and Value Counts
Pandas: How to Use GroupBy with Bin Counts
Pandas: How to Count Values in Column with Condition

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *