How to Use na.rm in R (With Examples)


You can use the argument na.rm = TRUE to exclude missing values when calculating descriptive statistics in R.

#calculate mean and exclude missing values
mean(x, na.rm = TRUE)

#calculate sum and exclude missing values 
sum(x, na.rm = TRUE)

#calculate maximum and exclude missing values 
max(x, na.rm = TRUE)

#calculate standard deviation and exclude missing values 
sd(x, na.rm = TRUE)

The following examples show how to use this argument in practice with both vectors and data frames.

Example 1: Use na.rm with Vectors

Suppose we attempt to calculate the mean, sum, max, and standard deviation for the following vector in R that contains some missing values:

#define vector with some missing values
x <- c(3, 4, 5, 5, 7, NA, 12, NA, 16)

mean(x)

[1] NA

sum(x)

[1] NA

max(x)

[1] NA

sd(x)

[1] NA

Each of these functions returns a value of NA.

To exclude missing values when performing these calculations, we can simply include the argument na.rm = TRUE as follows:

#define vector with some missing values
x <- c(3, 4, 5, 5, 7, NA, 12, NA, 16)

mean(x, na.rm = TRUE)

[1] 7.428571

sum(x, na.rm = TRUE)

[1] 52

max(x, na.rm = TRUE)

[1] 16

sd(x, na.rm = TRUE)

[1] 4.790864

Notice that we were able to complete each calculation successfully while excluding the missing values.

Example 2: Use na.rm with Data Frames

Suppose we have the following data frame in R that contains some missing values:

#create data frame
df <- data.frame(var1=c(1, 3, 3, 4, 5),
                 var2=c(7, 7, NA, 3, 2),
                 var3=c(3, 3, NA, 6, 8),
                 var4=c(1, 1, 2, 8, NA))

#view data frame
df

  var1 var2 var3 var4
1    1    7    3    1
2    3    7    3    1
3    3   NA   NA    2
4    4    3    6    8
5    5    2    8   NA

We can use the apply() function to calculate descriptive statistics for each column in the data frame and use the na.rm = TRUE argument to exclude missing values when performing these calculations:

#calculate mean of each column
apply(df, 2, mean, na.rm = TRUE)

var1 var2 var3 var4 
3.20 4.75 5.00 3.00 

#calculate sum of each column
apply(df, 2, sum, na.rm = TRUE)

var1 var2 var3 var4 
  16   19   20   12 

#calculate max of each column
apply(df, 2, max, na.rm = TRUE)

var1 var2 var3 var4 
   5    7    8    8 

#calculate standard deviation of each column
apply(df, 2, sd, na.rm = TRUE)

    var1     var2     var3     var4 
1.483240 2.629956 2.449490 3.366502

Once again, we were able to complete each calculation successfully while excluding the missing values.

Additional Resources

How to Sum Specific Columns in R
How to Calculate the Mean of Multiple Columns in R
How to Find the Max Value Across Multiple Columns in R

Leave a Reply

Your email address will not be published. Required fields are marked *