How to Use Wildcard Characters in R


Often you may want to use wildcard characters in R to select specific rows that contain certain substrings.

The easiest way to work with wildcard characters is by using the grep function in R.

Here are the most common ways to do so:

Method 1: Select Rows Where String Starts With

#select all rows where team column starts with 'Ma'
df[grep('Ma*', df$team),]

Method 2: Select Rows Where String Ends With

#select all rows where team column ends with 's'
df[grep('*s', df$team),]

Method 3: Select Rows Where String Contains

#select all rows where team column contains 'av'
df[grep('av', df$team),]

Note that the first argument of the grep() function specifies the pattern to look for and the second argument specifies the column to look for the pattern in.

In each of these examples, we search for a specific pattern in the team column of the data frame named df.

We then use basic subset notation to extract the rows from the data frame that contain the specific pattern we searched for in the team column.

The following examples show how to use each method in practice with the following data frame in R:

#create data frame
df <- data.frame(team=c('Mavs', 'Nets', 'Magic', 'Heat', 'Cavs'),
                 points=c(99, 68, 86, 88, 95),
                 assists=c(22, 28, 45, 28, 31),
                 rebounds=c(30, 28, 36, 30, 29))

#view data frame
df

   team points assists rebounds
1  Mavs     99      22       30
2  Nets     68      28       28
3 Magic     86      45       36
4  Heat     88      28       30
5  Cavs     95      31       29

Example 1: Select Rows where String Starts With

We can use the following syntax to select all rows in the data frame where the string in the team column starts with the pattern ‘Ma’:

#select all rows where team column starts with 'Ma'
df[grep('Ma*', df$team),]

   team points assists rebounds
1  Mavs     99      22       30
3 Magic     86      45       36

This returns the rows from the data frame with the following values in the team column:

  • Mavs
  • Magic

Notice that both of these team names start with the pattern ‘Ma’, just as we specified.

Example 2: Select Rows where String Ends With

We can use the following syntax to select all rows in the data frame where the string in the team column ends with the pattern ‘s’:

#select all rows where team column ends with 's'
df[grep('*s', df$team),]

  team points assists rebounds
1 Mavs     99      22       30
2 Nets     68      28       28
5 Cavs     95      31       29

This returns the rows from the data frame with the following values in the team column:

  • Mavs
  • Nets
  • Cavs

Notice that each of these team names end with the pattern ‘s’, just as we specified.

Example 3: Select Rows where String Contains

We can use the following syntax to select all rows in the data frame where the string in the team column contains the pattern ‘av’ anywhere in the string:

#select all rows where team column contains 'av'
df[grep('av', df$team),]

   team points assists rebounds
1  Mavs     99      22       30
3 Magic     86      45       36
4  Heat     88      28       30
5  Cavs     95      31       29

This returns the rows from the data frame with the following values in the team column:

  • Mavs
  • Cavs

Notice that both of these team names contain the pattern ‘av’ in the name.

Additional Resources

The following tutorials explain how to perform other common tasks  in R:

How to Use str_split in R
How to Use str_replace in R
How to Count Words in String in R
How to Convert a Vector to String in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *