R: How to Split Column by Number of Characters


Often you may want to split one column in R into multiple columns based on a specific number of characters.

There are two common ways to do so:

Method 1: Use transform() from Base R

transform(df, Num=substr(emp, 1, 4), Alpha=substr(emp, 5, 6))

This particular method uses the transform() function from base R to split the column named emp into two separate columns in which the first column will contain the first 4 characters of the string in the original column and the second column will contains the characters in positions 5 through 6 of the string.

Note that we name the two new columns Num and Alpha, respectively.

Method 2: Use separate() from tidyr

library(tidyr)

df %>% separate(emp, into = c('Num', 'Alpha'), sep = 4)

This particular method uses the separate() function from base R to split the column named emp into two separate columns in which the first column will contain the first 4 characters of the string in the original column and the second column will contain all remaining characters in the string.

Note that we also name the two new columns Num and Alpha, respectively.

The following examples show how to use each method in practice with the following data frame that contains information about employees at some company:

#create data frame
df <- data.frame(emp=c('4007AB', '5003BB', '6400AC', '8378CC', '9004CD'),
                 sales=c(130, 298, 200, 454, 238))

#view data frame
df

Note that the emp column contains the employee ID value and the sales column contains the number of total sales made by each employee during a given time period.

Method 1: Split Column by Number of Characters Using transform() from Base R

Suppose that we would like to split the emp column so that the first four characters are split into one column and the remaining two characters are split into a second column.

One way to do so is by using the transform() function from base R:

#split emp column into two columns based on specific number of characters
transform(df, Num=substr(emp, 1, 4), Alpha=substr(emp, 5, 6))

     emp sales  Num Alpha
1 4007AB   130 4007    AB
2 5003BB   298 5003    BB
3 6400AC   200 6400    AC
4 8378CC   454 8378    CC
5 9004CD   238 9004    CD

Notice that the transform() function has split the strings in the emp column such that the first four characters are split into one column and the remaining two characters are split into a second column.

Method 2: Split Column by Number of Characters Using separate() from tidyr

Once again, suppose that we would like to split the emp column so that the first four characters are split into one column and the remaining two characters are split into a second column.

One way to do so is by using the separate() function from the tidyr package:

library(tidyr) 

#split emp column into two columns based on specific number of characters
df %>% separate(emp, into = c('Num', 'Alpha'), sep = 4) 

   Num Alpha sales
1 4007    AB   130
2 5003    BB   298
3 6400    AC   200
4 8378    CC   454
5 9004    CD   238

Notice that the separate() function has split the strings in the emp column such that the first four characters are split into one column and the remaining two characters are split into a second column.

Note: You can find the complete documentation for the separate() function from the tidyr package here.

Additional Resources

The following tutorials explain how to perform other common operations in R:

How to Combine Two Columns into One in R
How to Sort a Data Frame by Column in R
How to Add Columns to Data Frame in R

Leave a Reply

Your email address will not be published. Required fields are marked *