R: How to Use grep() to Find Exact Match


You can use the grep() function in R to find elements in a vector that match a particular pattern.

Often you may want to use the grep() function to find elements that have an exact match with a particular pattern and not just a partial match.

You can use the \\b command with grep() to do so, which specifies a word boundary. This allows you to find elements that only have exact matches with a particular pattern.

You can use the following basic syntax to do so:

#create new data frame that contains rows with Mavs in team column
df_new <- df[grep('\\bMavs\\b', df$team), ]

This particular example will return all rows from the data frame named df in which the team column contains the exact string Mavs.

This will not return any row that contains something similar such as Mavs2, NewMavs, etc. The string must exactly match the pattern that we specified.

The following example shows how to use this syntax in practice.

Example: How to Use grep() to Find Exact Match in R

Suppose we create the following data frame in R that contains information about various basketball teams:

#create data frame
df <- data.frame(team=c('Mavs', 'Hawks', 'Nets', 'Heat', 'Cavs', 'Mavs2', 'Kings'),
                 points=c(104, 115, 124, 120, 112, 140, 112),
                 status=c('Bad', 'Good', 'Excellent', 'Great', 'Bad', 'Great', 'Bad'))

#view data frame
df

   team points    status
1  Mavs    104       Bad
2 Hawks    115      Good
3  Nets    124 Excellent
4  Heat    120     Great
5  Cavs    112       Bad
6 Mavs2    140     Great
7 Kings    112       Bad

Suppose that we would like to use the grep() function to return each of the rows in the data frame that have the exact team name of Mavs.

Suppose we attempt to use the following syntax to do so:

#create new data frame that contains rows with Mavs in team column
df_new <- df[grep('Mavs', df$team), ]

#view new data frame
df_new

   team points status
1  Mavs    104    Bad
6 Mavs2    140  Great

Notice that this returns all rows in the data frame that contain Mavs anywhere in the string in the team column. This even returns the row with a value of Mavs2 in the team column.

If we would like to only return rows that exactly match Mavs with extra characters, then we can use the \\b pattern in the grep() function as follows:

#create new data frame that contains rows with Mavs in team column
df_new <- df[grep('\\bMavs\\b', df$team), ]

#view new data frame
df_new

  team points status
1 Mavs    104    Bad

Notice that this only returns the one row in the data frame in which the string in the team column exactly matches the string Mavs.

By using \\bMavs\\b in the grep() function we are able to specify that we want to precisely match the string Mavs and that no other characters can come before or after the string. 

Note: We used the general pattern df[grep(…), ] to specify that we would like to return rows that match a particular pattern and return all columns for those particular rows.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Concatenate Vector of Strings in R
How to Extract Numbers from Strings in R
How to Remove Spaces from Strings in R
How to Compare Strings in R

Leave a Reply

Your email address will not be published. Required fields are marked *