Pandas: How to Use describe() by Group


You can use the describe() function to generate descriptive statistics for variables in a pandas DataFrame.

You can use the following basic syntax to use the describe() function with the groupby() function in pandas:

df.groupby('group_var')['values_var'].describe()

The following example shows how to use this syntax in practice.

Example: Use describe() by Group in Pandas

Suppose we have the following pandas DataFrame that contains information about basketball players on two different teams:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'points': [8, 12, 14, 14, 15, 22, 27, 24],
                   'assists':[2, 2, 3, 5, 7, 6, 8, 12]})

#view DataFrame
print(df)

  team  points  assists
0    A       8        2
1    A      12        2
2    A      14        3
3    A      14        5
4    B      15        7
5    B      22        6
6    B      27        8
7    B      24       12

We can use the describe() function along with the groupby() function to summarize the values in the points column for each team:

#summarize points by team
df.groupby('team')['points'].describe()

	count	mean	std	        min	25%	50%	75%	max
team								
A	4.0	12.0	2.828427	8.0	11.00	13.0	14.00	14.0
B	4.0	22.0	5.099020	15.0	20.25	23.0	24.75	27.0

From the output, we can see the following values for the points variable for each team:

  • count (number of observations)
  • mean (mean points value)
  • std (standard deviation of points values)
  • min (minimum points value)
  • 25% (25th percentile of points)
  • 50% (50th percentile (i.e. median) of points)
  • 75% (75th percentile of points)
  • max (maximum points value)

If you’d like the results to be displayed in a DataFrame format, you can use the reset_index() argument:

#summarize points by team
df.groupby('team')['points'].describe().reset_index()

        team	count	mean	std	        min	25%	50%	75%	max
0	A	4.0	12.0	2.828427	8.0	11.00	13.0	14.00	14.0
1	B	4.0	22.0	5.099020	15.0	20.25	23.0	24.75	27.0

The variable team is now a column in the DataFrame and the index values are 0 and 1.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

Pandas: How to Calculate Cumulative Sum by Group
Pandas: How to Count Unique Values by Group
Pandas: How to Calculate Correlation By Group

Leave a Reply

Your email address will not be published.