How to Calculate Standard Deviation in R: Explanation & Examples

Standard deviation in R

This tutorial explains how to calculate the standard deviation in R, including an explanation of the formula used as well as several examples.

What is Standard Deviation?

The standard deviation is a common way to measure how “spread out” values are in a dataset. The formula to find the standard deviation of a sample is:

Σ (xi – μ)2 / (n-1)

where Σ is a fancy symbol that means “sum”, xi is the ith value in the dataset, μ is the mean value of the dataset, and is the sample size.

How to Calculate Standard Deviation in R

We can use the built-in sd() function to easily calculate the standard deviation of a sample in R.

For example, the following code illustrates how to find the sample standard deviation of a dataset:

#create dataset
data <- c(1, 3, 4, 6, 11, 14, 17, 20, 22, 23)

#find standard deviation
sd(data)

#[1] 8.279157

Note that the standard deviation is equivalent to the square root of the variance:

sqrt(var(data))

#[1] 8.279157

Note that we could also write our own custom function to find the sample standard deviation:

#create custom function to find standard deviation
find_sd <- function(x) {
  sqrt(sum((x-mean(x))^2/(length(x)-1)))
}

#find standard deviation
find_sd(data)

#[1] 8.279157

Also note that we must specify na.rm = TRUE if we wish to calculate the sample standard deviation of a dataset and there are missing values present:

#create vector of values with NA
data_NA <- c(1, NA, 4, 6, NA, 14, 17, 20, 22, 23)

#attempt to find standard deviation
sd(data_NA)

#[1] NA

#find standard deviation by excluding missing values
sd(data_NA, na.rm = TRUE)

#[1] 8.61788

How to Calculate Several Standard Deviations in R At Once

In the previous examples, we showed how to find the standard deviation for a single vector of values. However, we can also use the sd() function to find the standard deviation of one or more variables in a dataset.

For example, consider the built-in R dataset mtcars:

#view first six lines of mtcars dataset
head(mtcars)

#                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

To find the standard deviation of the variable mpg, we can use the following code:

#find standard deviation of mpg
sd(mtcars$mpg)

#[1] 6.026948

We can also find the standard deviation of several variables at once by using the apply() function. For example, the following code illustrates how to find the standard deviation of the variables mpg, cyl, and wt all at once:

#find standard deviation of mpg, cyl, and wt
apply(mtcars[ , c('mpg', 'cyl', 'wt')], 2, sd)

And we can find the standard deviation of every single variable in the dataset by using the following code:

#find standard deviation of all variables
apply(mtcars, 2, sd)

#        mpg         cyl        disp          hp        drat          wt 
#  6.0269481   1.7859216 123.9386938  68.5628685   0.5346787   0.9784574 
#       qsec          vs          am        gear        carb 
#  1.7869432   0.5040161   0.4989909   0.7378041   1.6152000 

Further Reading:
Measuring Spread – Range, Interquartile Range, Variance, and Standard Deviation
How to Explore a Dataset in R Using Descriptive Statistics

Leave a Reply

Your email address will not be published. Required fields are marked *