How to Calculate the Mean by Group in R (With Examples)


Often you may want to calculate the mean by group in R. There are three methods you can use to do so:

Method 1: Use base R.

aggregate(df$col_to_aggregate, list(df$col_to_group_by), FUN=mean) 

Method 2: Use the dplyr() package.

library(dplyr)

df %>%
  group_by(col_to_group_by) %>%
  summarise_at(vars(col_to_aggregate), list(name = mean))

Method 3: Use the data.table package.

library(data.table)

dt[ ,list(mean=mean(col_to_aggregate)), by=col_to_group_by]

The following examples show how to use each of these methods in practice.

Method 1: Calculate Mean by Group Using Base R

The following code shows how to use the aggregate() function from base R to calculate the mean points scored by team in the following data frame:

#create data frame
df <- data.frame(team=c('a', 'a', 'b', 'b', 'b', 'c', 'c'),
                 pts=c(5, 8, 14, 18, 5, 7, 7),
                 rebs=c(8, 8, 9, 3, 8, 7, 4))

#view data frame
df

  team pts rebs
1    a   5    8
2    a   8    8
3    b  14    9
4    b  18    3
5    b   5    8
6    c   7    7
7    c   7    4

#find mean points scored by team
aggregate(df$pts, list(df$team), FUN=mean)

  Group.1        x
1       a  6.50000
2       b 12.33333
3       c  7.00000

Method 2: Calculate Mean by Group Using dplyr

The following code shows how to use the group_by() and summarise_at() functions from the dplyr package to calculate the mean points scored by team in the following data frame:

library(dplyr) 

#create data frame
df <- data.frame(team=c('a', 'a', 'b', 'b', 'b', 'c', 'c'),
                 pts=c(5, 8, 14, 18, 5, 7, 7),
                 rebs=c(8, 8, 9, 3, 8, 7, 4))

#find mean points scored by team 
df %>%
  group_by(team) %>%
  summarise_at(vars(pts), list(name = mean))

# A tibble: 3 x 2
  team   name
  <fct> <dbl>
1 a       6.5
2 b      12.3
3 c       7  

Method 3: Calculate Mean by Group Using data.table

The following code shows how to calculate the mean points scored by team in the following data frame:

library(data.table) 

#create data frame
df <- data.frame(team=c('a', 'a', 'b', 'b', 'b', 'c', 'c'),
                 pts=c(5, 8, 14, 18, 5, 7, 7),
                 rebs=c(8, 8, 9, 3, 8, 7, 4))

#convert data frame to data table 
setDT(df)

#find mean points scored by team 
df[ ,list(mean=mean(pts)), by=team]

   team     mean
1:    a  6.50000
2:    b 12.33333
3:    c  7.00000

Notice that all three methods return identical results.

Related: A Complete Guide to the mean Function in R

Additional Resources

How to Calculate the Sum by Group in R
How to Calculate Quantiles by Group in R

Featured Posts

4 Replies to “How to Calculate the Mean by Group in R (With Examples)”

  1. I have the following types of data. Some times there is also another data having missing values. I tried the above three methods to calculate the mean of each groups. But not working. Can you help me please? Thank you.

    Treat AFI121
    CON 50.64880952
    CON 53.25595238
    CON 58.12903226
    CON 49.26785714
    CON 49.37125749
    CON 39.02380952
    CON 52.13690476
    CON 48.61309524
    ANT 56.97959184
    ANT 49.20238095
    ANT 51.70238095
    ANT 51.7797619
    ANT 53.82142857
    ANT 53.60714286
    ANT 50.25
    ANT 48.17032967
    D400 52.99378882
    D400 53.59868421
    D400 53.61904762
    D400 53.2797619
    D400 48.6547619
    D400 54.35119048
    D400 54.92261905
    D400 50.32738095
    D800 54.76190476
    D800 54.64880952
    D800 53.89880952
    D800 51.61585366
    D800 55.15483871
    D800 50.7202381
    D800 56.05031447
    D800 46.86904762
    D1200 49.69642857
    D1200 55.60509554
    D1200 53.27380952
    D1200 46.8452381
    D1200 50.89285714
    D1200 52.26190476
    D1200 51.46428571
    D1200 52.66666667

    How can also test the mean separation values by using Tukey, LSD AND Duncan multiple range test methods for the above one way anova? Many thanks
    email; sefibahir2009@gmail.com

  2. I have the following types of data. Some times there is also another data having missing values. I tried the above three methods to calculate the mean of each groups. But not working. Can you help me please? Thank you.
    Treat AFI121
    CON 50.64880952
    CON 53.25595238
    CON 58.12903226
    CON 49.26785714
    CON 49.37125749
    CON 39.02380952
    CON 52.13690476
    CON 48.61309524
    ANT 56.97959184
    ANT 49.20238095
    ANT 51.70238095
    ANT 51.7797619
    ANT 53.82142857
    ANT 53.60714286
    ANT 50.25
    ANT 48.17032967
    D400 52.99378882
    D400 53.59868421
    D400 53.61904762
    D400 53.2797619
    D400 48.6547619
    D400 54.35119048
    D400 54.92261905
    D400 50.32738095
    D800 54.76190476
    D800 54.64880952
    D800 53.89880952
    D800 51.61585366
    D800 55.15483871
    D800 50.7202381
    D800 56.05031447
    D800 46.86904762
    D1200 49.69642857
    D1200 55.60509554
    D1200 53.27380952
    D1200 46.8452381
    D1200 50.89285714
    D1200 52.26190476
    D1200 51.46428571
    D1200 52.66666667

    How can also test the mean separation values by using Tukey, LSD AND Duncan multiple range test methods for the above one way anova? Many thanks email; sefibahir2009@gmail.com

  3. January 27, 2023 at 11:18 am
    I have the following types of data (5 treatments each with 8 replications, total 5*8=40 observations. i.e CRD design and wanted to do One way ANOVA). Some times there is also a similar data but having missing values. Can you help me please to calculate the following five questions?. I tried so many times but R software is not friendly but I want to learn more.
    Things that I need to calculate;
    1. ANOVA analysis (one way anova)
    2. Summary or descriptive statistics
    3. Mean separation using Tukey, LSD and Duncan multiple range test methods for the above one way anova?
    4. Incase if the data will have a missing value, how to calculate the above things by avoiding the missing value?
    5. Assigning letters for the significant differences between or among treatment groups?
    6. Is that also possible to do the above statsiscial analysis if there are more than 1 or 2 dependent variables (factors)? what is the syntax to do the analysis?

    Treat AFI121
    CON 50.64880952
    CON 53.25595238
    CON 58.12903226
    CON 49.26785714
    CON 49.37125749
    CON 39.02380952
    CON 52.13690476
    CON 48.61309524
    ANT 56.97959184
    ANT 49.20238095
    ANT 51.70238095
    ANT 51.7797619
    ANT 53.82142857
    ANT 53.60714286
    ANT 50.25
    ANT 48.17032967
    D400 52.99378882
    D400 53.59868421
    D400 53.61904762
    D400 53.2797619
    D400 48.6547619
    D400 54.35119048
    D400 54.92261905
    D400 50.32738095
    D800 54.76190476
    D800 54.64880952
    D800 53.89880952
    D800 51.61585366
    D800 55.15483871
    D800 50.7202381
    D800 56.05031447
    D800 46.86904762
    D1200 49.69642857
    D1200 55.60509554
    D1200 53.27380952
    D1200 46.8452381
    D1200 50.89285714
    D1200 52.26190476
    D1200 51.46428571
    D1200 52.66666667

    Many thanks email; sefibahir2009@gmail.com

Leave a Reply

Your email address will not be published. Required fields are marked *