Often when you’re working with datasets, you’ll encounter missing values. Missing values in R are often represented as **NA **or some other value like **999**.

Fortunately it’s easy to identify, modify, and exclude missing values in R and this tutorial explains how to do so.

**How to Identify Missing Values**

The most straightforward way to identify missing values in a vector, list, matrix, or data frame is by using **is.na(),** which returns TRUE or FALSE for each value in the data structure.

The following code shows how to test whether each value in a vector is missing or not:

#create a vectorxwith some missing values x <- c(4, NA, 12, NA, 19, 34, 22) #test whether each value is missing or not is.na(x) #[1] FALSE TRUE FALSE TRUE FALSE FALSE FALSE #identify the position of the missing values in the vector which(is.na(x)) #[1] 2 4 #identify total number of missing values in the vector sum(is.na(x)) #[1] 2

This next chunk of code shows how to test whether each value in a data frame is missing or not:

#create a data frame with three columns and five rows data <- data.frame(a = c(2, 5, 12, NA, NA), b = c(15, NA, 4, 3, 1), c = c(21, 23, NA, 9, 4)) data # a b c #1 2 15 21 #2 5 NA 23 #3 12 4 NA #4 NA 3 9 #5 NA 1 4 #identify all missing values in entire data frame is.na(data) # a b c #[1,] FALSE FALSE FALSE #[2,] FALSE TRUE FALSE #[3,] FALSE FALSE TRUE #[4,] TRUE FALSE FALSE #[5,] TRUE FALSE FALSE #identify total number of missing values in entire data frame sum(is.na(data)) #[1] 4 #identify total number of missing values in column named 'c' sum(is.na(data$c)) #[1] 1 #identify total number of missing values in each column colSums(is.na(data)) #a b c #2 1 1

**How to Modify Missing Values**

Once you’ve identified missing values in a vector or data frame, you may then want to replace the missing values with some other value like the median or the mean.

The following code shows how to replace missing values in a vector:

#create a vectorxwith some missing values x <- c(4, NA, 12, NA, 19, 34, 22) #replace missing values with the mean value of the vector x[is.na(x)] <- mean(x, na.rm = TRUE) x #[1] 4.0 18.2 12.0 18.2 19.0 34.0 22.0 #Alternatively, replace missing values with the median value of the vector x[is.na(x)] <- median(x, na.rm = TRUE) x #[1] 4 19 12 19 19 34 22

This next chunk of code shows how to replace missing values in a data frame:

#create a data frame with missing values coded as '999' data <- data.frame(a = c(2, 5, 12, 999, 999), b = c(15, 999, 4, 3, 1), c = c(21, 23, 999, 9, 4)) data # a b c #1 2 15 21 #2 5 999 23 #3 12 4 999 #4 999 3 9 #5 999 1 4 #replace missing values with NAs data[data == 999] <- NA data a b c 1 2 15 21 2 5 NA 23 3 12 4 NA 4 NA 3 9 5 NA 1 4 #replace missing values in column 'c' with the average value of column 'c' data$c[is.na(data$c)] <- mean(data$c, na.rm = TRUE) data # a b c #1 2 15 21.00 #2 5 NA 23.00 #3 12 4 14.25 #4 NA 3 9.00 #5 NA 1 4.00

**How to Exclude Missing Values**

If you would instead like to exclude missing values, rather than modify or replace them, then you can do so fairly easily in R.

When performing operations on vectors (like finding the sum, mean, median, max, min, etc.), the easiest way to exclude missing values is to use the argument **na.rm = TRUE**. If missing values are present and you fail to use this argument, then you will likely receive NA as a result of your operation.

The following code illustrates a couple examples of using na.rm = TRUE to find the mean and the median of a vector:

#create a vectorxwith some missing values x <- c(4, NA, 12, NA, 19, 34, 22) #find mean of vector without using na.rm = TRUE mean(x) #[1] NA #find mean of vector using na.rm = TRUE to exclude missing values mean(x, na.rm = TRUE) #[1] 18.2 #find median of vector without using na.rm = TRUE median(x) #[1] NA #find median of vector using na.rm = TRUE to exclude missing values median(x, na.rm = TRUE) #[1] 19

When dealing with data frames, you may only want to look at the rows where there are no missing values. There are two common methods for excluding rows that contain missing values:

**complete.cases()** – returns rows of data frame that are “complete”, i.e. have no missing values. The syntax for using this is as follows:

**data[complete.cases(data), ]**

**na.omit()** – omits any rows in a data frame that have missing values. The syntax for using this is as follows:

**na.omit(data)**

Notice that these two functions return the same results.

The following code illustrates these two functions in action:

#create a data frame with some missing values data <- data.frame(a = c(2, 5, 12, NA, NA), b = c(15, NA, 4, 3, 1), c = c(21, 23, NA, 9, 4)) data # a b c #1 2 15 21 #2 5 NA 23 #3 12 4 NA #4 NA 3 9 #5 NA 1 4 #only look at rows where there is no missing data data[complete.cases(data), ] # a b c #1 2 15 21 #omit all rows with missing data na.omit(data) # a b c #1 2 15 21