How to Use the replace_na Function in R


Often you may want to replace missing values in a data frame or vector in R.

One of the easiest ways to do so is by using the replace_na() function from the tidyr package in R, which is designed to perform this exact task.

The replace_na() function uses the following syntax:

replace_na(data, replace, …)

where:

  • data: Name of the data frame or vector
  • replace: The value to use to replace the missing values

It’s important to note that if you’re using the replace_na() function with a data frame then you can provide a named list of values to be used as replacements in each column of the data frame.

However, if you’re using the replace_na() function with a vector then you should only provide one specific value that will be used to replace all missing values in the vector.

The following examples show how to use the replace_na() function in practice to replace values in both vectors and data frames in R.

Example 1: How to Use replace_na to Replace Missing Values in Vector

Suppose we create the following vector in R that contains the number of points scored by various basketball players in a game:

#create vector
my_data <- c(4, 5, 19, 30, NA, NA, 14, 19, 30, NA, 8, 12)

#view vector
my_data

 [1]  4  5 19 30 NA NA 14 19 30 NA  8 12

We can see that several of the values in the vector contain NA, which is used to represent a missing value.

Suppose that we would like to fill in each of these missing values in the vector with a specific value.

We can use the following syntax with the replace_na() function to replace each missing value with 0:

library(tidyr)

#replace all missing values in vector with 0
my_data <- replace_na(my_data, 0)

#view updated vector
my_data

 [1]  4  5 19 30  0  0 14 19 30  0  8 12

Notice that each missing value in the vector has been replaced with the value 0.

Example 2: How to Use replace_na to Replace Missing Values in Data Frame

Suppose we create the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(12, NA, 20, 40, 34, NA, 28, 19),
                 assists=c(NA, 4, 5, 9, 12, 0, 4, NA))

#view data frame
df

  team points assists
1    A     12      NA
2    A     NA       4
3    A     20       5
4    A     40       9
5    B     34      12
6    B     NA       0
7    B     28       4
8    B     19      NA

We can see that several of the columns in the data frame contain NA values.

Suppose that we would like to make the following replacements:

  • Replace missing values in points column with the value 20.
  • Replace missing values in assists column with the value 10.

We can use the following syntax with the replace_na() function to do so:

#relace missing values in columns with specific values
df <- replace_na(df, list(points=20, assists=10))

#view updated data frame
df

  team points assists
1    A     12      10
2    A     20       4
3    A     20       5
4    A     40       9
5    B     34      12
6    B     20       0
7    B     28       4
8    B     19      10

Notice that all missing values in the points column have been replaced with 20 and all missing values in the assists column have been replaced with 10, just as we specified.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Use str_replace in R
How to Use str_split in R
How to Use str_detect in R
How to Use str_count in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *