How to Use the trimws Function in R


Often you may want to trim leading or trailing whitespace from strings in R.

One of the easiest ways to do so is by using the trimws() function from base R, which is designed to perform this exact task.

The trimws() function uses the following syntax:

trimws(x, which = c(“both”, “left”, “right”), whitespace = “[ \t\r\n]”)

where:

  • x: Name of the vector
  • which: Side on which to remove whitespace
  • whitespace: A string specifying a regular expression to match

The following examples show how to use the trimws() function in practice to trim whitespace from strings.

Example: How to Use the trimws() Function in R

Suppose we create the following data frame in R that contains information about employee ID numbers and total sales made by each of the employees at some company:

#create data frame
df <- data.frame(ID=c(' A004 ', 'A005', 'B003  ', ' C099', 'D145', 'A003 '),
                 sales=c(12, 15, 22, 24, 20, 40))

#view data frame
df

      ID sales
1  A004     12
2   A005    15
3 B003      22
4   C099    24
5   D145    20
6  A003     40

We can see that several of the values in the ID column of the data frame contain either trailing whitespaces, leading whitespaces, or both.

Suppose that we would like to trim the whitespaces from both sides of each string in the ID column.

We can use the trimws() function with the following syntax to do so:

#trim whitespaces from strings in 'ID' column
df$ID <- trimws(df$ID)

#view updated data frame
df

    ID sales
1 A004    12
2 A005    15
3 B003    22
4 C099    24
5 D145    20
6 A003    40

We can see that both the leading and trailing whitespaces have been trimmed from each string in the ID column of the data frame.

Note: The default behavior of the trimws() function is to trim the whitespaces from ‘both’ sides of the string, which is why both the trailing and leading whitespaces were trimmed from each string even though we didn’t specify a method for the which argument.

If instead we would like to only trim the leading spaces from each string in the ID column then we can use the following syntax:

#trim leading whitespaces only from strings in 'ID' column
df$ID <- trimws(df$ID, which='left')

#view updated data frame
df

      ID sales
1  A004     12
2   A005    15
3 B003      22
4   C099    24
5   D145    20
6  A003     40

We can see that only the leading whitespaces have been trimmed from each string in the ID column of the data frame.

If instead we would like to only trim the trailing spaces from each string in the ID column then we can use the following syntax:

#trim trailing whitespaces only from strings in 'ID' column
df$ID <- trimws(df$ID, which='right')

#view updated data frame
df

     ID sales
1  A004    12
2  A005    15
3  B003    22
4  C099    24
5  D145    20
6  A003    40

We can see that only the trailing whitespaces have been trimmed from each string in the ID column of the data frame.

Depending on your goal, you may decide to use ‘both’, ‘left’ or ‘right’ for the which argument of the trimws function.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Use str_replace in R
How to Use str_split in R
How to Use str_detect in R
How to Use str_count in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *