How to Use the describe() Function in R


Often you may want to calculate descriptive statistics for each column in a data frame in R.

One of the easiest ways to do so is by using the describe() function from the psych package in R, which can be used to perform this exact task.

The describe() function uses the following syntax:

describe(x, na.rm=TRUE, interp=FALSE, skew = TRUE, ranges = TRUE, …)

where:

  • x: Name of vector or matrix to be replicated
  • na.rm: Whether NA values should be removed when calculating statistics
  • interp: Whether the median should be standard or interpolated
  • skew: Whether the skewness and kurtosis should be calculated
  • ranges: Whether the range should be calculated

The following example shows how to use the describe() function in practice to calculate descriptive statistics for each column in a data frame in R.

Note: Before using the describe() function, you may need to first install the psych package. You can use the following syntax to do so:

install.packages('psych')

Once the psych package has been installed, you can proceed to use the describe() function.

Example: How to Use the describe() Function in R

Suppose that we create the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(99, 68, 86, 88, 95, 74, 78, 93),
                 assists=c(22, 28, 31, 35, 34, 45, 28, 31),
                 rebounds=c(30, 28, 24, 24, 30, 36, 30, 29))

#view data frame
df

  team points assists rebounds
1    A     99      22       30
2    A     68      28       28
3    A     86      31       24
4    A     88      35       24
5    B     95      34       30
6    B     74      45       36
7    B     78      28       30
8    B     93      31       29

The data frame contains the following information about eight different basketball players:

  • team: The team they are on.
  • points: Their total points scored in the season.
  • assists: Their total assist in the season.
  • rebounds: Their total rebounds in the season.

Suppose that we would like to calculate descriptive statistics for each of these variables at once, including the mean, median, range, etc.

We can use the following syntax with the describe() function to do so:

library(psych)

#calculate descriptive statistics for each variable in data frame
describe(df)

         vars n  mean    sd median trimmed   mad min max range  skew kurtosis
team*       1 8  1.50  0.53    1.5    1.50  0.74   1   2     1  0.00    -2.23
points      2 8 85.12 10.88   87.0   85.12 12.60  68  99    31 -0.25    -1.62
assists     3 8 31.75  6.71   31.0   31.75  4.45  22  45    23  0.55    -0.51
rebounds    4 8 28.88  3.83   29.5   28.88  1.48  24  36    12  0.30    -0.85
           se
team*    0.19
points   3.85
assists  2.37
rebounds 1.36

The describe() function returns a variety of descriptive statistics for each variable.

Note: By default, the describe() function attempts to calculate descriptive statistics for all variables, even ones that are not numeric. In this particular example the team column is a character so it doesn’t make sense to interpret the values in the team row of the output.

Here is how to interpret each value in the output:

  • n: Total number of observations
  • mean: The mean value
  • sd: The standard deviation of values
  • median: The median value
  • trimmed: The trimmed mean (10% trimmed from top and bottom)
  • mad: The mean absolute deviation of values
  • min: The minimum value
  • max: The maximum value
  • range: The range of values (max – min)
  • skew: The skewness of values
  • kurtosis: The kurtosis of values
  • se: The standard error of values

By using the describe() function we are able to gain a strong understanding of the distribution of values for each variable in our data frame.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Create a Frequency Table by Group in R
How to Create a Frequency Polygon in R
How to Create Relative Frequency Tables in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *