Pandas: How to Use Group By with Where Condition


The easiest way to use group by with a where condition in pandas is to use the query() function:

df.query("team == 'A'").groupby(["position"])["points"].mean().reset_index()

This particular example example calculates the mean value of points, grouped by position, where team is equal to ‘A’ in some pandas DataFrame.

The following example shows how to use this syntax in practice.

Example: How to Use Group By with Where Condition in Pandas

Suppose we have the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B'],
                   'position': ['G', 'G', 'F', 'F', 'F', 'G', 'G', 'F'],
                   'points': [22, 14, 15, 10, 8, 29, 33, 18]})

#view DataFrame
print(df)

  team position  points
0    A        G      22
1    A        G      14
2    A        F      15
3    A        F      10
4    A        F       8
5    B        G      29
6    B        G      33
7    B        F      18

We can use the following code to calculate the mean value of points, grouped by position, where team is equal to ‘A’:

#calculate mean value of points, grouped by position, where team == 'A'
df.query("team == 'A'").groupby(["position"])["points"].mean().reset_index()

        position  points
0	F	  11.0
1	G	  18.0

From the output we can see:

  • The mean points value for players in position ‘F’ is on team A is 11.
  • The mean points value for players in position ‘G’ on team A is 18.

Note that we can also use the & operator in the query() function to query for rows where multiple conditions are met.

For example, the following code shows how to calculate the mean value of points, grouped by position, where team is equal to ‘A’ and position is equal to ‘G’:

#calculate mean value of points by position where team is 'A' and position is 'G'
df.query("team=='A' & position=='G'").groupby(["position"])["points"].mean().reset_index()

	position  points
0	G	  18.0

From the output we can see that the mean points value for players in position ‘G’ on team A is 18.

Since we specified two conditions in the query() function, only the rows that met both conditions were used.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Perform a GroupBy Sum in Pandas
How to Use Groupby and Plot in Pandas
How to Count Unique Values Using GroupBy in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *