How to Use the ave() Function in R


Often you may want to calculate summary statistics for one variable, grouped by the levels of one or more other variables in R.

One way to do so is by using the ave() function from base R, which is designed to perform this exact task.

The ave() function uses the following basic syntax:

ave(x, …, FUN = mean)

where:

  • x: The variable to compute the summary  statistic for
  • : One or more variables to group by
  • FUN: The summary statistic to calculate for each group

Despite the name, the ave() function can be used to calculate any summary statistic and not just the average of a variable. For example, it can be used to calculate the min, max, median, standard deviation of a variable, etc.

Note: The ave() function comes built-in with base R so you do not need to install or load any external packages to use this function.

Example: How to Use the ave() Function in R

Suppose we create the following data frame that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(22, 25, 30, 34, 19, 14, 13, 18),
                 assists=c(7, 6, 6, 4, 8, 10, 12, 11))

#view data frame
df

  team points assists
1    A     22       7
2    A     25       6
3    A     30       6
4    A     34       4
5    B     19       8
6    B     14      10
7    B     13      12
8    B     18      11

Suppose that we would like to create a new column that calculates the mean number of points scored by each team.

We can use the ave() function with the following syntax to do so:

#create new column to calculate mean points by team
df$mean_points <- ave(df$points, df$team)

#view updated data frame
df

  team points assists mean_points
1    A     22       7       27.75
2    A     25       6       27.75
3    A     30       6       27.75
4    A     34       4       27.75
5    B     19       8       16.00
6    B     14      10       16.00
7    B     13      12       16.00
8    B     18      11       16.00

Notice that the new column named mean_points now contains the mean number of points scored by players on each team.

For example, we can see:

  • The mean points scored by players on team A is 27.75.
  • The mean points scored by players on team B is 16.00.

Note that we can use the ave() function to calculate any summary statistic that we would like.

For example, we could use the following syntax to calculate the max points scored by players on each team:

#create new column to calculate median points by team
df$max_points <- ave(df$points, df$team, FUN = max)

#view updated data frame
df

  team points assists max_points
1    A     22       7         34
2    A     25       6         34
3    A     30       6         34
4    A     34       4         34
5    B     19       8         19
6    B     14      10         19
7    B     13      12         19
8    B     18      11         19

Notice that the new column named max_points now contains the max number of points scored by players on each team.

For example, we can see:

  • The max points scored by players on team A is 34.
  • The max points scored by players on team B is 19.

Feel free to specify any summary statistic that you would like in the FUN argument of the ave() function to calculate a different metric instead.

Common choices include the min, max, mean, median and standard deviation among other metrics.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Use str_split in R
How to Use str_replace in R
How to Count Words in String in R
How to Convert a Vector to String in R

Leave a Reply

Your email address will not be published. Required fields are marked *