Pandas: How to Filter Rows After Using GroupBy()


Often you may want to filter the rows of a pandas DataFrame after using the GroupBy() function to initially group the rows based on a particular column.

You can use the following basic syntax to do so:

df.groupby('some_column').filter(lambda x: some condition)

By using this syntax, you can group the rows of a pandas DataFrame by one or more specific columns, then filter the grouped rows to only show the “groups” that meet a particular condition.

The following examples show how to use this syntax in practice.

Example: How to Filter Rows After Using GroupBy() in Pandas

Suppose we have a pandas DataFrame that contains three columns with values for various basketball players:

  • team: The team name
  • position: The player position
  • points: Total points scored by the player:

We can use the following syntax to create this DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'],
                   'position': ['F', 'G', 'F', 'G', 'F', 'G', 'F', 'G'],
                   'points': [20, 22, 16, 40, 14, 16, 31, 35]})

#view DataFrame
print(df)

  team position  points
0    A        F      20
1    A        G      22
2    A        F      16
3    B        G      40
4    B        F      14
5    B        G      16
6    C        F      31
7    C        G      35

Suppose that we would like to group the rows based on the value in the team column and then filter to only show the teams that have a mean value in the points column greater than 22.

We can use the following syntax to do so:

#group by team column and filter for teams with mean points > 22
df.groupby('team').filter(lambda x: x['points'].mean() > 22)

        team	position points
3	B	G	 40
4	B	F	 14
5	B	G	 16
6	C	F	 31
7	C	G	 35

This returns only the rows where the value in the team column is either B or C because these are the two teams that have a mean points value greater than 22.

We can use similar syntax to filter based on another metric.

For example, we can use the following syntax to group the rows by the team column and then filter to only show the teams that have a sum of points less than 60:

#group by team column and filter for teams with sum of points < 60
df.groupby('team').filter(lambda x: x['points'].sum() < 60)

        team	position points
0	A	F	 20
1	A	G	 22
2	A	F	 16

This returns only the rows where the value in the team column is A, which is the only team that has a sum of points values less than 60.

If you would like to group by multiple columns, then you can use brackets inside the GroupBy() function.

For example, you can use the following syntax to group the rows by the values in the team and position columns, then filter to only show the rows where the team-position combo has a sum of points greater than 35:

#group by team and position, then filter for points > 35
df.groupby(['team', 'position']).filter(lambda x: x['points'].sum() > 35)

        team	position points
0	A	F	 20
2	A	F	 16
3	B	G	 40
5	B	G	 16

Notice that this only returns the rows where the team-position combo has a sum of points greater than 35.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

Pandas: Get Index of Rows Whose Column Matches Value
Pandas: How to Select Columns Containing a Specific String
Pandas: How to Check if Column Contains String

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *