R: How to Use grep() to Not Include Specific Matches


You can use the grep() function in R to find elements in a vector that match a particular pattern.

Often you may want to use the grep() function to find elements that do not match a particular pattern.

You can use the ! operator with grepl() to do so, which represents “NOT” logic in R.

You can use the following basic syntax to do so:

#create new data frame that contains rows that do not match 'avs' in team column
df_new <- df[!grepl('avs', df$team), ]

This particular example will return all rows from the data frame named df in which the team column does not match the pattern ‘avs’ anywhere in the string.

The following example shows how to use this syntax in practice.

Related: How to Use Case-Insensitive grep() in R

Example: How to Use grep() to Not Include Specific Matches in R

Suppose we create the following data frame in R that contains information about various basketball teams:

#create data frame
df <- data.frame(team=c('Mavs', 'Hawks', 'Nets', 'Heat', 'Cavs', 'Mavs2', 'Kings'),
                 points=c(104, 115, 124, 120, 112, 140, 112),
                 status=c('Bad', 'Good', 'Excellent', 'Great', 'Bad', 'Great', 'Bad'))

#view data frame
df

   team points    status
1  Mavs    104       Bad
2 Hawks    115      Good
3  Nets    124 Excellent
4  Heat    120     Great
5  Cavs    112       Bad
6 Mavs2    140     Great
7 Kings    112       Bad

Suppose that we would like to use the grep() function to return each of the rows in the data frame that do not match the pattern ‘avs’ in the team column.

We can use the following syntax to do so:

#create new data frame that contains rows that do not match 'avs' in team column
df_new <- df[!grepl('avs', df$team), ]

#view new data frame
df_new

   team points    status
2 Hawks    115      Good
3  Nets    124 Excellent
4  Heat    120     Great
7 Kings    112       Bad

Notice that this returns all rows in the data frame that do not contain avs anywhere in the string in the team column.

Specifically, we can see that the new data frame contains the following strings in the team column:

  • Hawks
  • Nets
  • Heat
  • Kings

Notice that none of these team names contain the pattern avs in the name.

Also note that you can use the | operator as “OR” logic to specify multiple patterns to not match on:

#create new data frame that contains rows that do not match 'avs' or 'ets' in team
df_new <- df[!grepl('avs|ets', df$team), ]

#view new data frame
df_new

   team points status
2 Hawks    115   Good
4  Heat    120  Great
7 Kings    112    Bad

Notice that this returns all rows in the data frame that do not match the pattern avs or ets in the team column of the data frame.

Feel free to use the | operator as many times as you would like to avoid matching as many patterns as you would like.

Note: We used the general pattern df[!grepl(…), ] to specify that we would like to return rows that do not match a particular pattern and return all columns for those particular rows.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Concatenate Vector of Strings in R
How to Extract Numbers from Strings in R
How to Remove Spaces from Strings in R
How to Compare Strings in R

Leave a Reply

Your email address will not be published. Required fields are marked *