Pandas: How to Use GroupBy with nlargest()


You can use the following syntax to display the n largest values by group in a pandas DataFrame:

#display two largest values by group
df.groupby('group_var')['values_var'].nlargest(2)

And you can use the following syntax to perform some operation (like taking the sum) on the n largest values by group in a pandas DataFrame:

#find sum of two largest values by group
df.groupby('group_var')['values_var'].apply(lambda grp: grp.nlargest(2).sum())

The following examples shows how to use each method in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
                   'points': [12, 29, 34, 14, 10, 11, 7, 36, 34, 22]})

#view DataFrame
print(df)

  team  points
0    A      12
1    A      29
2    A      34
3    A      14
4    A      10
5    B      11
6    B       7
7    B      36
8    B      34
9    B      22

Example 1: Display N Largest Values by Group

We can use the following syntax to display the two largest points values grouped by team:

#display two largest points values grouped by team
df.groupby('team')['points'].nlargest(2)

team   
A     2    34
      1    29
B     7    36
      8    34
Name: points, dtype: int64

The output shows the two largest points values for each team, along with their index positions in the original DataFrame.

Example 2: Perform Operation on N Largest Values by Group

We can use the following syntax to calculate the sum of the two largest points values grouped by team:

#calculate sum of two largest points values for each team
df.groupby('team')['points'].apply(lambda grp: grp.nlargest(2).sum())

team
A    63
B    70
Name: points, dtype: int64

Here’s how to interpret the output:

  • The sum of the two largest points values for team A is 63.
  • The sum of the two largest points values for team B is 70.

We can use similar syntax to calculate the mean of the two largest points values grouped by team:

#calculate  mean of two largest points values for each team
df.groupby('team')['points'].apply(lambda grp: grp.nlargest(2).mean())

team
A    31.5
B    35.0
Name: points, dtype: float64

Here’s how to interpret the output:

  • The mean of the two largest points values for team A is 31.5.
  • The mean of the two largest points values for team B is 35.0.

Note: You can find the complete documentation for the GroupBy function here.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

Pandas: How to Calculate Cumulative Sum by Group
Pandas: How to Count Unique Values by Group
Pandas: How to Calculate Mode by Group
Pandas: How to Calculate Correlation By Group

Leave a Reply

Your email address will not be published. Required fields are marked *