This tutorial explains how to calculate the standard deviation in R, including an explanation of the formula used as well as several examples.
What is Standard Deviation?
The standard deviation is a common way to measure how “spread out” values are in a dataset. The formula to find the standard deviation of a sample is:
√Σ (xi – μ)2 / (n-1)
where Σ is a fancy symbol that means “sum”, xi is the ith value in the dataset, μ is the mean value of the dataset, and n is the sample size.
How to Calculate Standard Deviation in R
We can use the built-in sd() function to easily calculate the standard deviation of a sample in R.
For example, the following code illustrates how to find the sample standard deviation of a dataset:
#create dataset data <- c(1, 3, 4, 6, 11, 14, 17, 20, 22, 23) #find standard deviation sd(data) #[1] 8.279157
Note that the standard deviation is equivalent to the square root of the variance:
sqrt(var(data)) #[1] 8.279157
Note that we could also write our own custom function to find the sample standard deviation:
#create custom function to find standard deviation find_sd <- function(x) { sqrt(sum((x-mean(x))^2/(length(x)-1))) } #find standard deviation find_sd(data) #[1] 8.279157
Also note that we must specify na.rm = TRUE if we wish to calculate the sample standard deviation of a dataset and there are missing values present:
#create vector of values with NA data_NA <- c(1, NA, 4, 6, NA, 14, 17, 20, 22, 23) #attempt to find standard deviation sd(data_NA) #[1] NA #find standard deviation by excluding missing values sd(data_NA, na.rm = TRUE) #[1] 8.61788
How to Calculate Several Standard Deviations in R At Once
In the previous examples, we showed how to find the standard deviation for a single vector of values. However, we can also use the sd() function to find the standard deviation of one or more variables in a dataset.
For example, consider the built-in R dataset mtcars:
#view first six lines of mtcars dataset
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
To find the standard deviation of the variable mpg, we can use the following code:
#find standard deviation of mpg
sd(mtcars$mpg)
#[1] 6.026948
We can also find the standard deviation of several variables at once by using the apply() function. For example, the following code illustrates how to find the standard deviation of the variables mpg, cyl, and wt all at once:
#find standard deviation of mpg, cyl, and wt
apply(mtcars[ , c('mpg', 'cyl', 'wt')], 2, sd)
And we can find the standard deviation of every single variable in the dataset by using the following code:
#find standard deviation of all variables
apply(mtcars, 2, sd)
# mpg cyl disp hp drat wt
# 6.0269481 1.7859216 123.9386938 68.5628685 0.5346787 0.9784574
# qsec vs am gear carb
# 1.7869432 0.5040161 0.4989909 0.7378041 1.6152000