You can use the aggfunc argument within the pandas crosstab() function to create a crosstab that aggregates values using a specific metric:
pd.crosstab(index=df.col1, columns=df.col2, values=df.col3, aggfunc='count')
The default value for aggfunc is ‘count’ but you can specify other aggregation methods such as mean, median, sum, min, max, etc.
You can also specify multiple aggregation methods in the aggfunc argument:
pd.crosstab(index=df.col1, columns=df.col2, values=df.col3, aggfunc=['min', 'max'])
The following examples show how to use each of these methods in practice with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'], 'position':['G', 'G', 'F', 'G', 'F', 'F', 'F', 'G', 'G', 'F', 'F'], 'points': [22, 25, 24, 39, 34, 20, 18, 17, 20, 19, 22]}) #view DataFrame print(df) team position points 0 A G 22 1 A G 25 2 A F 24 3 B G 39 4 B F 34 5 B F 20 6 B F 18 7 C G 17 8 C G 20 9 C F 19 10 C F 22
Example 1: Create Crosstab with One Value in aggfunc
We can use the following crosstab() function with the argument aggfunc=’mean’ to create a crosstab that displays the mean value points for each combination of position and team:
#create crosstab that displays mean points value by team and position pd.crosstab(index=df.team, columns=df.position, values=df.points, aggfunc='mean') position F G team A 24.0 23.5 B 24.0 39.0 C 20.5 18.5
Here is how to interpret the output:
- The average points for players on team A in position F is 24.
- The average points for players on team A in position G is 23.5.
And so on.
We can also use a different aggregation metric, such as the maximum value:
#create crosstab that displays max points value by team and position pd.crosstab(index=df.team, columns=df.position, values=df.points, aggfunc='max') position F G team A 24 25 B 34 39 C 22 20
Here is how to interpret the output:
- The max points for players on team A in position F is 24.
- The max points for players on team A in position G is 25.
And so on.
Example 2: Create Crosstab with Multiple Values in aggfunc
We can use the crosstab() function with multiple values in the aggfunc argument to aggregate the points values by multiple metrics for each combination of position and team:
#create crosstab that displays min and max points by team and position pd.crosstab(df.team, df.position, df.points, aggfunc=['min', 'max']) min max position F G F G team A 24 22 24 25 B 18 39 34 39 C 19 17 22 20
Here is how to interpret the output:
- The minimum points value for players on team A in position F is 24.
- The minimum points value for players on team A in position G is 22.
- The maximum points value for players on team A in position F is 24.
- The maximum points value for players on team A in position G is 25.
And so on.
Note: You can find the complete documentation for the pandas crosstab() function here.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
Pandas: How to Create Crosstab with Percentages
Pandas: How to Use GroupBy and Value Counts
Pandas: How to Use GroupBy with Bin Counts