How to Use str_extract_all() in R


Often you may want to extract all matches of a particular pattern in a string in R.

One of the easiest ways to do so is by using the str_extract_all() function from the stringr package in R, which can be used to perform this exact task.

The str_extract_all() function uses the following syntax:

str_extract_all(string, pattern, simplify = FALSE)

where:

  • string: Name of an input character vector
  • pattern: The pattern to look for
  • simplify: FALSE returns a list of character vectors. TRUE returns a character matrix.

The following example shows how to use the str_extract_all() function in practice to extract all occurrences of specific patterns in strings in R.

Note: Before using the str_extract_all() function, you may need to first install the stringr package. You can use the following syntax to do so:

install.packages('stringr')

Once the stringr package has been installed, you can proceed to use the str_extract_all() function.

Example: How to Use the str_extract_all() Function in R

Suppose that we create the following vector that contains various strings:

#create vector of strings
my_strings <- c('hey hey there', 'oh hey', 'hello everyone', 'heyo how are you')

#view vector
my_strings

[1] "hey hey there"    "oh hey"           "hello everyone"   "heyo how are you"

Suppose that we would like to extract each occurrence of “hey” in each string.

We can use the str_extract_all() function with the following syntax to do so:

library(stringr)

#extract all occurrences of "hey" in each string
str_extract_all(my_strings, 'hey')

[[1]]
[1] "hey" "hey"

[[2]]
[1] "hey"

[[3]]
character(0)

[[4]]
[1] "hey"

Notice that each occurrence of “hey” has been extracted from each string.

Here is how to interpret the output:

  • The first string in the vector contained two instances of “hey”
  • The second string in the vector contained one instance of “hey”
  • The third string in the vector contained zero instances of “hey”
  • The fourth string in the vector contained one instance of “hey”

Note that we could also specify simplify = TRUE to instead return a character matrix as a result.

We can use the following syntax to do so:

library(stringr)

#extract all occurrences of "hey" in each string
str_extract_all(my_strings, 'hey', simplify = TRUE)

     [,1]  [,2] 
[1,] "hey" "hey"
[2,] "hey" ""   
[3,] ""    ""   
[4,] "hey" ""   

This returns the same results as the previous example except the instances of “hey” are now shown in a character matrix instead of a list.

Note: The str_extract() function from the stringr package will only extract the first occurrence of a particular pattern in a string. Feel free to use that function if you don’t want to extract each occurrence of a particular pattern.

Feel free to return either a list or a matrix as a result, whichever you prefer.

It’s also worth noting that the str_extract_all() function is case-sensitive, which means that the exact case of the pattern that we search for must match or else the string will not be extracted.

Note: You can find the complete documentation for the str_extract_all() function here.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Use str_remove in R
How to Use str_match in R
How to Use str_pad in R
How to Use str_replace_all() in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *