R: How to Use grep() for Pattern Matching and Replacement


You can use the grep() function in R to find elements in a vector that match a particular pattern.

Often you may want to use the grep() function to find elements that match a particular pattern and then replace those elements with a new value.

You can use the following basic syntax to do so:

#replace each value in team column that contains 'avs' with 'TEAM'
df$team[grep('avs', df$team)] <- 'TEAM'

This particular example will find each value in the team column of the data frame that contains the pattern avs and then replace each value with a new value of TEAM.

Note that we use the grep() function to first match the rows that we are interested in and then we use the <- operator to assign a new value to the team column of each of those rows.

The following example shows how to use this syntax in practice.

Example: How to Use grepl() for Pattern Matching and Replacement in R

Suppose we create the following data frame in R that contains information about various basketball teams:

#create data frame
df <- data.frame(team=c('Mavs', 'Hawks', 'Nets', 'Heat', 'Cavs', 'Mavs2', 'Kings'),
                 points=c(104, 115, 124, 120, 112, 140, 112),
                 status=c('Bad', 'Good', 'Excellent', 'Great', 'Bad', 'Great', 'Bad'))

#view data frame
df

   team points    status
1  Mavs    104       Bad
2 Hawks    115      Good
3  Nets    124 Excellent
4  Heat    120     Great
5  Cavs    112       Bad
6 Mavs2    140     Great
7 Kings    112       Bad

Suppose that we would like to use the grep() function to identify each row in the team column that contains the pattern avs and then replace each of those values with a new value of TEAM.

Suppose we attempt to use the following syntax to do so:

#replace each value in team column that contains 'avs' with 'TEAM'
df$team[grep('avs', df$team)] <- 'TEAM'

#view new data frame
df

   team points    status
1  TEAM    104       Bad
2 Hawks    115      Good
3  Nets    124 Excellent
4  Heat    120     Great
5  TEAM    112       Bad
6  TEAM    140     Great
7 Kings    112       Bad

Notice that each row that contained the pattern avs in the team column has now been replaced with the string TEAM instead.

Specifically, we can see that the following replacements were made:

  • Mavs was replaced by TEAM
  • Cavs was replaced by TEAM
  • Mavs2 was replaced by TEAM

We can see that each of the three strings that were replaced all had avs as a pattern in the original string.

It’s important to note that the grep() function is case-sensitive, thus the pattern that we search for must exactly match the case of the pattern in the data frame.

For example, if we instead searched for the pattern AVS in the grep() function instead then no replacements would be made because there are no strings in the team column that contain the pattern AVS with all uppercase letters.

Keep this in mind when using the grep() function to find matching patterns in a data frame and replacing those patterns.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Concatenate Vector of Strings in R
How to Extract Numbers from Strings in R
How to Remove Spaces from Strings in R
How to Compare Strings in R

Leave a Reply

Your email address will not be published. Required fields are marked *