How to Filter Rows that Contain a Certain String Using dplyr


Often you may want to filter rows in a data frame in R that contain a certain string. Fortunately this is easy to do using the filter() function from the dplyr package and the grepl() function in Base R.

This tutorial shows several examples of how to use these functions in practice using the following data frame:

#create data frame
df <- data.frame(player = c('P Guard', 'S Guard', 'S Forward', 'P Forward', 'Center'),
                 points = c(12, 15, 19, 22, 32),
                 rebounds = c(5, 7, 7, 12, 11))

#view data frame
df

     player points rebounds
1   P Guard     12        5
2   S Guard     15        7
3 S Forward     19        7
4 P Forward     22       12
5    Center     32       11

Example 1: Filter Rows that Contain a Certain String

The following code shows how to filter rows that contain a certain string:

#load dplyr package
library(dplyr)

#filter rows that contain the string 'Guard' in the player column
df %>% filter(grepl('Guard', player))

   player points rebounds
1 P Guard     12        5
2 S Guard     15        7

Related: Comparing grep() vs. grepl() in R: What’s the Difference?

Example 2: Filter Rows that Contain at Least One String

The following code shows how to filter rows that contain ‘Guard’ or ‘Forward’ in the player column:

#filter rows that contain 'Guard' or 'Forward' in the player column
df %>% filter(grepl('Guard|Forward', player))

     player points rebounds
1   P Guard     12        5
2   S Guard     15        7
3 S Forward     19        7
4 P Forward     22       12

The following code shows how to filter rows that contain ‘P’ or ‘Center’ in the player column:

#filter rows that contain 'P' or 'Center' in the player column
df %>% filter(grepl('P|Center', player))

     player points rebounds
1   P Guard     12        5
2 P Forward     22       12
3    Center     32       11

Example 3: Filter Out Rows that Contain a Certain String

The following code shows how to filter out (i.e. remove) rows that contain ‘Guard’ in the player column:

#filter out rows that contain 'Guard' in the player column
df %>% filter(!grepl('Guard', player))

     player points rebounds
1 S Forward     19        7
2 P Forward     22       12
3    Center     32       11

The following code shows how to filter out (i.e. remove) rows that contain ‘Guard’ or ‘Center’ in the player column:

#filter out rows that contain 'Guard' or 'Center' in the player column
df %>% filter(!grepl('Guard|Center', player))

     player points rebounds
1 S Forward     19        7
2 P Forward     22       12

You can find more R tutorials here.

Leave a Reply

Your email address will not be published.