How to Use a Conditional Filter in dplyr


You can use the following basic syntax to apply a conditional filter on a data frame using functions from the dplyr package in R:

library(dplyr)

#filter data frame where points is greater than some value (based on team)
df %>% 
  filter(case_when(team=='A' ~ points > 15,
                   team=='B' ~ points > 20,
                   TRUE ~ points > 30))

This particular example filters the rows in a data frame where the value in the points column is greater than a certain value, conditional on the value in the team column.

Related: An Introduction to case_when() in dplyr

The following example shows how to use this syntax in practice.

Example: How to Use Conditional Filter in dplyr

Suppose we have the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'),
                 points=c(10, 12, 17, 18, 24, 29, 29, 34, 35))

#view data frame
df

  team points
1    A     10
2    A     12
3    A     17
4    B     18
5    B     24
6    B     29
7    C     29
8    C     34
9    C     35

Now suppose we would like to apply the following conditional filter:

  • Only keep rows for players on team A where points is greater than 15
  • Only keep rows for players on team B where points is greater than 20
  • Only keep rows for players on team C where points is greater than 30

We can use the filter() and case_when() functions from the dplyr package to apply this conditional filter on the data frame:

library(dplyr)

#filter data frame where points is greater than some value (based on team)
df %>% 
  filter(case_when(team=='A' ~ points > 15,
                   team=='B' ~ points > 20,
                   TRUE ~ points > 30))

  team points
1    A     17
2    B     24
3    B     29
4    C     34
5    C     35

The rows in the data frame are now filtered where the value in the points column is greater than a certain value, conditional on the value in the team column.

Note #1: In the case_when() function, we use TRUE in the last argument to represent any values in the team column that are not equal to ‘A’ or ‘B’.

Note #2: You can find the complete documentation for the dplyr case_when() function here.

Additional Resources

The following tutorials explain how to perform other common functions in dplyr:

How to Filter by Row Number Using dplyr
How to Filter by Multiple Conditions Using dplyr
How to Use a “not in” Filter in dplyr

Leave a Reply

Your email address will not be published. Required fields are marked *