You can use the following basic syntax to get the top N rows by group in a pandas DataFrame:
df.groupby('group_column').head(2).reset_index(drop=True)
This particular syntax will return the top 2 rows by group.
Simply change the value inside the head() function to return a different number of top rows.
The following examples show how to use this syntax with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'], 'position': ['G', 'G', 'G', 'F', 'F', 'G', 'G', 'F', 'F', 'F'], 'points': [5, 7, 7, 9, 12, 9, 9, 4, 7, 7]}) #view DataFrame print(df) team position points 0 A G 5 1 A G 7 2 A G 7 3 A F 9 4 A F 12 5 B G 9 6 B G 9 7 B F 4 8 B F 7 9 B F 7
Example 1: Get Top N Rows Grouped by One Column
The following code shows how to return the top 2 rows, grouped by the team variable:
#get top 2 rows grouped by team
df.groupby('team').head(2).reset_index(drop=True)
team position points
0 A G 5
1 A G 7
2 B G 9
3 B G 9
The output displays the top 2 rows, grouped by the team variable.
Example 2: Get Top N Rows Grouped by Multiple Columns
The following code shows how to return the top 2 rows, grouped by the team and position variables:
#get top 2 rows grouped by team and position
df.groupby(['team', 'position']).head(2).reset_index(drop=True)
team position points
0 A G 5
1 A G 7
2 A F 9
3 A F 12
4 B G 9
5 B G 9
6 B F 4
7 B F 7
The output displays the top 2 rows, grouped by the team and position variables.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
Pandas: How to Find Unique Values in a Column
Pandas: How to Find Unique Values in Multiple Columns
Pandas: How to Count Occurrences of Specific Value in Column