How to Use Mutate to Create New Variables in R

How to use the mutate function in R

This tutorial explains how to use the mutate() function in R to add new variables to a data frame.

Adding New Variables in R

The following functions from the dplyr library can be used to add new variables to a data frame:

mutate() – adds new variables to a data frame while preserving existing variables

transmute() – adds new variables to a data frame and drops existing variables

mutate_all() –  modifies all of the variables in a data frame at once

mutate_at() –  modifies specific variables by name

mutate_if() – modifies all variables that meet a certain condition

mutate()

The mutate() function adds new variables to a data frame while preserving any existing variables. The basic synax for mutate() is as follows:

data <- mutate(new_variable = existing_variable/3)
  • data: the new data frame to assign the new variables to
  • new_variable: the name of the new variable
  • existing_variable: the existing variable in the data frame that you wish to perform some operation on to create the new variable

For example, the following code illustrates how to add a new variable root_sepal_width to the built-in iris dataset:

#define data frame as the first six lines of the iris dataset
data <- head(iris)

#view data
data

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#2          4.9         3.0          1.4         0.2  setosa
#3          4.7         3.2          1.3         0.2  setosa
#4          4.6         3.1          1.5         0.2  setosa
#5          5.0         3.6          1.4         0.2  setosa
#6          5.4         3.9          1.7         0.4  setosa

#load dplyr library
library(dplyr)

#define new column root_sepal_width as the square root of the Sepal.Width variable
data %>% mutate(root_sepal_width = sqrt(Sepal.Width))

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species root_sepal_width
#1          5.1         3.5          1.4         0.2  setosa         1.870829
#2          4.9         3.0          1.4         0.2  setosa         1.732051
#3          4.7         3.2          1.3         0.2  setosa         1.788854
#4          4.6         3.1          1.5         0.2  setosa         1.760682
#5          5.0         3.6          1.4         0.2  setosa         1.897367
#6          5.4         3.9          1.7         0.4  setosa         1.974842

transmute()

The transmute() function adds new variables to a data frame and drops existing variables. The following code illustrates how to add two new variables to a dataset and remove all existing variables:

#define data frame as the first six lines of the iris dataset
data <- head(iris)

#view data
data

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#2          4.9         3.0          1.4         0.2  setosa
#3          4.7         3.2          1.3         0.2  setosa
#4          4.6         3.1          1.5         0.2  setosa
#5          5.0         3.6          1.4         0.2  setosa
#6          5.4         3.9          1.7         0.4  setosa

#define two new variables and remove all existing variables
data %>% transmute(root_sepal_width = sqrt(Sepal.Width),
                   root_petal_width = sqrt(Petal.Width))

#  root_sepal_width root_petal_width
#1         1.870829        0.4472136
#2         1.732051        0.4472136
#3         1.788854        0.4472136
#4         1.760682        0.4472136
#5         1.897367        0.4472136
#6         1.974842        0.6324555

mutate_all()

The mutate_all() function modifies all of the variables in a data frame at once, allowing you to perform a specific function on all of the variables by using the funs()function. The following code illustrates how to divide all of the columns in a data frame by 10 using mutate_all():

#define new data frame as the first six rows of iris without the Species variable
data2 <- head(iris) %>% select(-Species)

#view the new data frame
data2

#  Sepal.Length Sepal.Width Petal.Length Petal.Width
#1          5.1         3.5          1.4         0.2
#2          4.9         3.0          1.4         0.2
#3          4.7         3.2          1.3         0.2
#4          4.6         3.1          1.5         0.2
#5          5.0         3.6          1.4         0.2
#6          5.4         3.9          1.7         0.4

#divide all variables in the data frame by 10
data2 %>% mutate_all(funs(./10))

#  Sepal.Length Sepal.Width Petal.Length Petal.Width
#1         0.51        0.35         0.14        0.02
#2         0.49        0.30         0.14        0.02
#3         0.47        0.32         0.13        0.02
#4         0.46        0.31         0.15        0.02
#5         0.50        0.36         0.14        0.02
#6         0.54        0.39         0.17        0.04

Note that additional variables can be added to the data frame by specifying a new name to be appended to the old variable name:

data2 %>% mutate_all(funs(mod = ./10))

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length_mod
#1          5.1         3.5          1.4         0.2             0.51
#2          4.9         3.0          1.4         0.2             0.49
#3          4.7         3.2          1.3         0.2             0.47
#4          4.6         3.1          1.5         0.2             0.46
#5          5.0         3.6          1.4         0.2             0.50
#6          5.4         3.9          1.7         0.4             0.54
#  Sepal.Width_mod Petal.Length_mod Petal.Width_mod
#1            0.35             0.14            0.02
#2            0.30             0.14            0.02
#3            0.32             0.13            0.02
#4            0.31             0.15            0.02
#5            0.36             0.14            0.02
#6            0.39             0.17            0.04

mutate_at()

The mutate_at() function modifies specific variables by name. The following code illustrates how to divide two specific variables by 10 using mutate_at():

data2 %>% mutate_at(c("Sepal.Length", "Sepal.Width"), funs(mod = ./10))

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length_mod
#1          5.1         3.5          1.4         0.2             0.51
#2          4.9         3.0          1.4         0.2             0.49
#3          4.7         3.2          1.3         0.2             0.47
#4          4.6         3.1          1.5         0.2             0.46
#5          5.0         3.6          1.4         0.2             0.50
#6          5.4         3.9          1.7         0.4             0.54
#  Sepal.Width_mod
#1            0.35
#2            0.30
#3            0.32
#4            0.31
#5            0.36
#6            0.39

mutate_if()

The mutate_if() function modifies all variables that meet a certain condition. The following code illustrates how to use the mutate_if() function to convert any variables of type factor to type character:

#find variable type of each variable in a data frame
data <- head(iris)
sapply(data, class)

#Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
#   "numeric"    "numeric"    "numeric"    "numeric"     "factor" 

#convert any variable of type factor to type character
new_data <- data %>% mutate_if(is.factor, as.character)
sapply(new_data, class)

#Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
#   "numeric"    "numeric"    "numeric"    "numeric"  "character"

The following code illustrates how to use the mutate_if() function to round any variables of type numeric to one decimal place:

#define data as first six rows of iris dataset
data <- head(iris)

#view data
data

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#2          4.9         3.0          1.4         0.2  setosa
#3          4.7         3.2          1.3         0.2  setosa
#4          4.6         3.1          1.5         0.2  setosa
#5          5.0         3.6          1.4         0.2  setosa
#6          5.4         3.9          1.7         0.4  setosa

#round any variables of type numeric to one decimal place
data %>% mutate_if(is.numeric, round, digits = 0)

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1            5           4            1           0  setosa
#2            5           3            1           0  setosa
#3            5           3            1           0  setosa
#4            5           3            2           0  setosa
#5            5           4            1           0  setosa
#6            5           4            2           0  setosa

Further reading:
How to Manipulate, Aggregate, and Analyze Data in R Using dplyr
A Guide to apply(), lapply(), sapply(), and tapply() in R

Leave a Reply

Your email address will not be published. Required fields are marked *