dplyr: How to Filter Based on Factor


You can use the following methods in dplyr to filter the rows of a data frame in R based on a factor variable:

Method 1: Filter Based on Factor Labels

library(dplyr)

#filter rows where team column is equal to factor label 'A' or 'C'
df %>% 
  filter(team %in% c('A', 'C'))

Method 2: Filter Based on Factor Levels

library(dplyr)

#filter rows where factor level of team column is greater than 2
df %>% 
  filter(as.integer(team)>2)

The following examples shows how to use each method in practice with the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(team=as.factor(c('A', 'A', 'A', 'B', 'B', 'C', 'C', 'D')),
                 points=c(12, 34, 20, 25, 22, 28, 34, 19))

#view data frame
df

  team points
1    A     12
2    A     34
3    A     20
4    B     25
5    B     22
6    C     28
7    C     34
8    D     19

Example 1: Filter Based on Factor Labels

We can use the following syntax to filter the data frame to only contain rows where the factor labels of the team column are equal to A or C:

library(dplyr)

#filter rows where team column is equal to factor label 'A' or 'C'
df %>% 
  filter(team %in% c('A', 'C'))

  team points
1    A     12
2    A     34
3    A     20
4    C     28
5    C     34

Notice that the resulting data frame only contains rows where the value in the team column is equal to either A or C.

Example 2: Filter Based on Factor Levels

We can use the following syntax to filter the data frame to only contain rows where the factor levels of the team column are greater than 2:

library(dplyr)

#filter rows where factor level of team column is greater than 2
df %>%
  filter(as.integer(team)>2)

  team points
1    C     28
2    C     34
3    D     19

In this particular example, the as.integer function converts the factor labels of the team column to integers.

For example:

  • Factor level ‘A’ becomes 1.
  • Factor level ‘B’ becomes 2.
  • Factor level ‘C’ becomes 3.
  • Factor level ‘D’ becomes 4.

Thus, when we filter for rows where the factor level is greater than 2, only the rows with a value of C or D in the team column are kept.

Additional Resources

The following tutorials explain how to perform other common functions in dplyr:

How to Remove Rows Using dplyr
How to Select Columns by Index Using dplyr
How to Filter Rows that Contain a Certain String Using dplyr

Leave a Reply

Your email address will not be published. Required fields are marked *