# Regular Expressions in R

Often when you’re working with datasets, you may want to search or replace strings. Fortunately R has many built-in functions to help you do so.

This tutorial explains how to use the following functions to search for matches:

• grep()
• grepl()

This tutorial also explains how to use the following functions for performing replacements:

• sub()
• gsub()

## Using grep() and grepl() to search for strings

The function grep() returns an integer vector of indices of the elements of a vector that match some pattern. The basic syntax for grep() is as follows:

grep(pattern, x, value = FALSE)

• pattern: character string to be matched in a given vector
• x: a vector where matches are sought
• value: if FALSE, a vector containing the indices of the matches is returned, and if TRUE, a vector containing the matching elements themselves is returned.

The following code illustrates an example of using grep() to find the indices of elements in a vector that match some pattern:

```#create a character vector x with four names
x <- c('bob', 'adam', 'doug', 'larry', 'harry')

#return a vector of indices that match the pattern 'arry'
grep('arry', x)

#[1] 4 5

#return the actual elements that match the pattern 'arry'
grep('arry', x, value = TRUE)

#[1] "larry" "harry"```

The function grepl() returns a logical vector of indices of the elements of a vector that match some pattern. The basic syntax for grepl() is as follows:

grepl(pattern, x)

• pattern: character string to be matched in a given vector
• x: a vector where matches are sought

The following code illustrates an example of using grepl() to find the indices of elements in a vector that match some pattern:

```#create a character vector x with four names
x <- c('bob', 'adam', 'doug', 'larry', 'harry')

#return a vector of indices that match the pattern 'arry'
grepl('arry', x)

#[1] FALSE FALSE FALSE TRUE TRUE```

## Using sub() and gsub() to replace strings

The function sub() replaces the first occurrence of a substring with another user-specified substring. The basic syntax for sub() is as follows:

sub(pattern, replacement, x)

• pattern: character string to be searched for in a given vector
• replacement: a replacement string for a matched pattern
• x: a vector where matches are sought

The following code illustrates an example of using sub() to replace the first occurrence of a substring with another user-specified substring:

```#create a sentence
sentence <- 'Jessica likes Hawaii. She would like to live in Hawaii some day.'

#replace the first occurrence of 'Hawaii' with 'HI'
sub('Hawaii', 'HI', sentence)

#[1] "Jessica likes HI. She would like to live in Hawaii some day."```

The function gsub() replaces all occurrences of a substring with another user-specified substring. The basic syntax for gsub() is as follows:

gsub(pattern, replacement, x)

• pattern: character string to be searched for in a given vector
• replacement: a replacement string for a matched pattern
• x: a vector where matches are sought

The following code illustrates an example of using gsub() to replace all occurrences of a substring with another user-specified substring:

```#create a sentence
sentence <- 'Jessica likes Hawaii. She would like to live in Hawaii some day.'

#replace all occurrences of 'Hawaii' with 'HI'
gsub('Hawaii', 'HI', sentence)

#[1] "Jessica likes HI. She would like to live in HI some day."
```

You can also use regular expressions with gsub(). The following code illustrates how to replace all digits in a sentence with blanks using the regular expression \\d, which represents digits.

```#create a sentence
sentence <- 'Jessica likes Hawaii in 2019. She wants to live in Hawaii in 2025.'

#replace all digits with blanks
gsub('\\d', '', sentence)

#[1] "Jessica likes Hawaii in . She wants to live in Hawaii in ."
```

Check out this Regular Expression Cheat Sheet to find several different regular expressions you can use with the gsub() function.