You can use the following methods in R to check if a string contains multiple substrings:
Method 1: Check if String Contains One of Several Substrings
df$contains_any <- apply(sapply(find_strings, grepl, df$team), 1, any)
This particular syntax checks if each string in the team column contains any of the strings specified in the vector of strings called find_strings.
Method 2: Check if String Contains Several Substrings
df$contains_any <- apply(sapply(find_strings, grepl, df$team), 1, all)
This particular syntax checks if each string in the team column contains all of the strings specified in the vector of strings called find_strings.
The following examples show how to use each method in practice with the following data frame in R:
#create data frame
df = data.frame(team=c('Good East Team', 'Good West Team', 'Great East Team',
'Great West Team', 'Bad East Team', 'Bad West Team'),
points=c(93, 99, 105, 110, 85, 88))
#view data frame
df
team points
1 Good East Team 93
2 Good West Team 99
3 Great East Team 105
4 Great West Team 110
5 Bad East Team 85
6 Bad West Team 88
Example 1: Check if String Contains One of Several Substrings
We can use the following syntax to check if each string in the team column contains either the substring “Good” or “East”:
#define substrings to look for
find_strings <- c('Good', 'East')
#check if each string in team column contains either substring
df$good_or_east <- apply(sapply(find_strings , grepl, df$team), 1, any)
#view updated data frame
df
team points good_or_east
1 Good East Team 93 TRUE
2 Good West Team 99 TRUE
3 Great East Team 105 TRUE
4 Great West Team 110 FALSE
5 Bad East Team 85 TRUE
6 Bad West Team 88 FALSE
The new good_or_east column returns the following values:
- TRUE if team contains “Good” or “East”
- FALSE if team contains neither “Good” nor “East”
Example 2: Check if String Contains Several Substrings
We can use the following syntax to check if each string in the team column contains the substring “Good” and “East”:
#define substrings to look for
find_strings <- c('Good', 'East')
#check if each string in team column contains either substring
df$good_and_east <- apply(sapply(find_strings , grepl, df$team), 1, all)
#view updated data frame
df
team points good_and_east
1 Good East Team 93 TRUE
2 Good West Team 99 FALSE
3 Great East Team 105 FALSE
4 Great West Team 110 FALSE
5 Bad East Team 85 FALSE
6 Bad West Team 88 FALSE
The new good_and_east column returns the following values:
- TRUE if team contains “Good” and “East”
- FALSE if team doesn’t contain “Good” and “East”
Notice that only one TRUE value is returned since there is only one team name that contains the substring “Good” and the substring “East.”
Additional Resources
The following tutorials explain how to perform other common tasks in R:
R: How to Check if Character is in String
R: How to Remove Spaces from Strings
R: How to Extract String Between Specific Characters