R: How to Use grep() and Return Only Substring


You can use the grep() function in R to find elements in a vector that match a particular pattern.

Often you may want to use the grep() function to find elements that match a particular substring and then only return the substring itself.

You can use the following basic syntax to do so:

sub('*avs*', '\\1', df$team[grep('*avs*', df$team)])

This particular example will match each string in the team column of the data frame named df that contains the pattern “avs” anywhere in the string and then return only the substring outside of the matching pattern.

Note that we use the sub() function in R to simply substitute a matching pattern with nothing, which returns only the substring outside of the matching pattern.

The following example shows how to use this syntax in practice.

Related: R: How to Use grep() to Find Exact Match

Example: How to Use grep() and Return Only Substring in R

Suppose we create the following data frame in R that contains information about various basketball teams:

#create data frame
df <- data.frame(team=c('Mavs', 'Hawks', 'Nets', 'Heat', 'Cavs', 'Mavs2', 'Kings'),
                 points=c(104, 115, 124, 120, 112, 140, 112),
                 status=c('Bad', 'Good', 'Excellent', 'Great', 'Bad', 'Great', 'Bad'))

#view data frame
df

   team points    status
1  Mavs    104       Bad
2 Hawks    115      Good
3  Nets    124 Excellent
4  Heat    120     Great
5  Cavs    112       Bad
6 Mavs2    140     Great
7 Kings    112       Bad

Suppose that we would like to use the grep() function to match each string in the team column of the data frame that contains the pattern “avs” and then only return the substring outside of this matching pattern.

We can use the following syntax to do so:

#return each substring that matches 'avs' in team column
matches_avs <- sub('*avs*', '\\1', df$team[grep('*avs*', df$team)]) 

#view matches
matches_avs

[1] "M"  "C"  "M2"

This returns a vector that contains all of the substrings outside of the matched patterns in the team column of the data frame.

Here is how to interpret the values in the output:

The first value of “M” matches “Mavs” in the team column and then returns only the substring outside of the “avs” pattern.

The second value of “C” matches “Cavs” in the team column and then returns only the substring outside of the “avs” pattern.

The third value of “M2” matches “Mavs2” in the team column and then returns only the substring outside of the “avs” pattern.

Note that the grep() function in R is case-sensitive by default. This means that it will only match the patterns that have the exact same case as the pattern that you specify.

For example, if we instead searched for the pattern “AVS” in the grep() function then there would be no matches because there are no strings in the team column of the data frame that match “AVS” with all uppercase letters.

Keep this in mind when using the grep() function to search for matching patterns.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Concatenate Vector of Strings in R
How to Extract Numbers from Strings in R
How to Remove Spaces from Strings in R
How to Compare Strings in R

Leave a Reply

Your email address will not be published. Required fields are marked *