Suppose we have the following data frame in R:
#create a data frame with three columns and five rows data <- data.frame(a = c(1, 2, 3, 4, 5), b = c(6, 7, 8, 9, 10), c = c(11, 12, 13, 14, 15)) data # a b c #1 1 6 11 #2 2 7 12 #3 3 8 13 #4 4 9 14 #5 5 10 15
In order to find the standard deviation of each column in this data frame, we can use the following piece of code:
#find standard deviation of each column apply(data, 2, sd) # a b c #1.581139 1.581139 1.581139
This returns a numeric vector of three values that represent the standard deviations of each column in the data frame.
This single line of code utilizes the built-in R function apply(), which can be used when you want to apply a function to the rows or columns of a matrix or data frame.
The basic syntax for the apply() function is as follows:
apply(X, MARGIN, FUN)
- X is the name of the matrix or data frame
- MARGIN indicates which dimension to perform an operation across (1 = row, 2 = column)
- FUN is the specific operation you want to perform (e.g. min, max, sum, mean, etc.)
In this case X = data, MARGIN = 2 (for columns), and FUN = sd (for standard deviation).
Thus, apply(data, 2, sd) allowed us to find the standard deviation for each column in our data frame.