How to Replace Missing Values in R (With Examples)


One of the most common data cleaning tasks that you will have to perform when using the R programming language is replacing missing values with new values.

You can use the following methods to do so:

Method 1: Replace Missing Values in Vector

#replace all NA values in vector with zero
my_vector[is.na(my_vector)] <- 0

Method 2: Replace Missing Values in All Columns of Data Frame

library(dplyr)

#replace all NA values in each column of data frame
df <- df %>% replace(is.na(.), 0)

Method 3: Replace Missing Values in Specific Column of Data Frame

library(dplyr)

#replace NA values with zero in column named col1
df <- df %>% mutate(col1 = ifelse(is.na(col1), 0, col1))

The following examples show how to use each of these methods in practice.

Example 1: Replace Missing Values in Vector

Suppose that we create the following vector named my_vector in R:

#create vector with some missing values
my_vector <- c(22, 14, NA, 5, NA, 7, 11, 9, NA, 18, 22, 24, 46)

Notice that there are several missing values in the vector.

We can use the following syntax to replace each of the missing values with zero instead:

#replace all NA values in vector with zero
my_vector[is.na(my_vector)] <- 0

#view updated vector
my_vector

 [1] 22 14  0  5  0  7 11  9  0 18 22 24 46

Notice that each of the missing values in the vector have been replaced with zero.

It’s worth noting that you can choose to replace missing values with any value that you would like.

For example, we can use the following syntax to replace all missing values in the vector with 100 instead:

#replace all NA values in vector with 100
my_vector[is.na(my_vector)] <- 100

#view updated vector
my_vector

 [1]  22  14 100   5 100   7  11   9 100  18  22  24  46

Notice that each of the missing values in the vector have been replaced with the value 100.

Example 2: Replace Missing Values in All Columns of Data Frame

Suppose we have the following data frame in R that contains information about various basketball players:

#create data frame with some missing values
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(99, NA, NA, 88, 95, 74, NA, 93),
                 assists=c(22, 28, 31, NA, 34, 45, 28, 31),
                 rebounds=c(30, 28, 24, 24, 30, 36, NA, 29))

#view data frame
df

  team points assists rebounds
1    A     99      22       30
2    A     NA      28       28
3    A     NA      31       24
4    A     88      NA       24
5    B     95      34       30
6    B     74      45       36
7    B     NA      28       NA
8    B     93      31       29

We can use the following syntax to replace the missing values with zero in each column of the data frame:

library(dplyr)

#replace missing values in each column with zero
df <- df %>% replace(is.na(.), 0)

#view updated data frame
df

  team points assists rebounds
1    A     99      22       30
2    A      0      28       28
3    A      0      31       24
4    A     88       0       24
5    B     95      34       30
6    B     74      45       36
7    B      0      28        0
8    B     93      31       29

Notice that each of the missing values in each column have been replaced with zero.

Example 3: Replace Missing Values in Specific Column of Data Frame

Once again, suppose we have the following data frame in R that contains information about various basketball players:

#create data frame with some missing values
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(99, NA, NA, 88, 95, 74, NA, 93),
                 assists=c(22, 28, 31, NA, 34, 45, 28, 31),
                 rebounds=c(30, 28, 24, 24, 30, 36, NA, 29))

#view data frame
df

  team points assists rebounds
1    A     99      22       30
2    A     NA      28       28
3    A     NA      31       24
4    A     88      NA       24
5    B     95      34       30
6    B     74      45       36
7    B     NA      28       NA
8    B     93      31       29

We can use the following syntax to replace the missing values with zero in only the points column of the data frame:

library(dplyr)

#replace missing values in points column with zero
df <- df %>% mutate(points = ifelse(is.na(points), 0, points))

#view updated data frame
df

  team points assists rebounds
1    A     99      22       30
2    A      0      28       28
3    A      0      31       24
4    A     88      NA       24
5    B     95      34       30
6    B     74      45       36
7    B      0      28       NA
8    B     93      31       29

Notice that each of the missing values in the points column have been replaced with zero.

All other columns with missing values have been left untouched.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Find and Count Missing Values in R
How to Interpolate Missing Values in R
How to Find Columns with All Missing Values in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *