How to Use describeBy() in R


Often you may want to calculate descriptive statistics for each column in a data frame in R, grouped by a particular column.

One of the easiest ways to do so is by using the describeBy() function from the psych package in R, which can be used to perform this exact task.

The describeBy() function uses the following syntax:

describeBy(x, group=NULL, mat=FALSE, type=3, digits=15, …)

where:

  • x: Name of data frame
  • group: A grouping variable or list of grouping variables
  • mat: Provide a matrix output rather than a list
  • type: The type of skewness and kurtosis to find
  • digits: Number of digits to report if matrix output is used

The following example shows how to use the describeBy() function in practice to calculate descriptive statistics grouped by a particular column in a data frame in R.

Note: Before using the describeBy() function, you may need to first install the psych package. You can use the following syntax to do so:

install.packages('psych')

Once the psych package has been installed, you can proceed to use the describeBy() function.

Example: How to Use the describeBy() Function in R

Suppose that we create the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(99, 68, 86, 88, 95, 74, 78, 93),
                 assists=c(22, 28, 31, 35, 34, 45, 28, 31),
                 rebounds=c(30, 28, 24, 24, 30, 36, 30, 29))

#view data frame
df

  team points assists rebounds
1    A     99      22       30
2    A     68      28       28
3    A     86      31       24
4    A     88      35       24
5    B     95      34       30
6    B     74      45       36
7    B     78      28       30
8    B     93      31       29

The data frame contains the following information about eight different basketball players:

  • team: The team they are on.
  • points: Their total points scored in the season.
  • assists: Their total assist in the season.
  • rebounds: Their total rebounds in the season.

Suppose that we would like to calculate descriptive statistics for each of the numeric variables in the data frame, grouped by the values in the team column.

We can use the following syntax with the describeBy() function to do so:

library(psych)

#calculate descriptive statistics for numeric columns grouped by team
describeBy(df, group='team')

 Descriptive statistics by group 
group: A
         vars n  mean    sd median trimmed  mad min max range  skew kurtosis
team*       1 4  1.00  0.00    1.0    1.00 0.00   1   1     0   NaN      NaN
points      2 4 85.25 12.84   87.0   85.25 9.64  68  99    31 -0.30    -1.86
assists     3 4 29.00  5.48   29.5   29.00 5.19  22  35    13 -0.18    -1.97
rebounds    4 4 26.50  3.00   26.0   26.50 2.97  24  30     6  0.14    -2.28
           se
team*    0.00
points   6.42
assists  2.74
rebounds 1.50
------------------------------------------------------------ 
group: B
         vars n  mean    sd median trimmed   mad min max range  skew kurtosis
team*       1 4  1.00  0.00    1.0    1.00  0.00   1   1     0   NaN      NaN
points      2 4 85.00 10.55   85.5   85.00 12.60  74  95    21 -0.03    -2.37
assists     3 4 34.50  7.42   32.5   34.50  4.45  28  45    17  0.51    -1.84
rebounds    4 4 31.25  3.20   30.0   31.25  0.74  29  36     7  0.70    -1.72
           se
team*    0.00
points   5.28
assists  3.71
rebounds 1.60

The describeBy() function returns a variety of descriptive statistics for the points, assists and rebounds columns, grouped by the team column.

Note that descriptive statistics for the team column are still shown in the output of each group, despite being the variable that we grouped on. Feel free to ignore the descriptive statistics for these rows in the output.

By using the describeBy() function we are able to quickly summarize the most common descriptive statistics for the numeric variables in our data frame, grouped by a character column.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Create a Frequency Table by Group in R
How to Create a Frequency Polygon in R
How to Create Relative Frequency Tables in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *