Often you may want to calculate summary statistics for one variable, grouped by the levels of one or more other variables in R.

One way to do so is by using the **ave()** function from base R, which is designed to perform this exact task.

The **ave****()** function uses the following basic syntax:

**ave(x, …, FUN = mean)**

where:

**x**: The variable to compute the summary statistic for**…**: One or more variables to group by**FUN**: The summary statistic to calculate for each group

Despite the name, the **ave()** function can be used to calculate any summary statistic and not just the average of a variable. For example, it can be used to calculate the min, max, median, standard deviation of a variable, etc.

**Note**: The **ave()** function comes built-in with base R so you do not need to install or load any external packages to use this function.

**Example: How to Use the ave() Function in R**

Suppose we create the following data frame that contains information about various basketball players:

**#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
points=c(22, 25, 30, 34, 19, 14, 13, 18),
assists=c(7, 6, 6, 4, 8, 10, 12, 11))
#view data frame
df
team points assists
1 A 22 7
2 A 25 6
3 A 30 6
4 A 34 4
5 B 19 8
6 B 14 10
7 B 13 12
8 B 18 11**

Suppose that we would like to create a new column that calculates the mean number of points scored by each team.

We can use the **ave()** function with the following syntax to do so:

#create new column to calculate mean points by team df$mean_points <- ave(df$points, df$team) #view updated data frame df team points assists mean_points 1 A 22 7 27.75 2 A 25 6 27.75 3 A 30 6 27.75 4 A 34 4 27.75 5 B 19 8 16.00 6 B 14 10 16.00 7 B 13 12 16.00 8 B 18 11 16.00

Notice that the new column named **mean_points** now contains the mean number of points scored by players on each team.

For example, we can see:

- The mean points scored by players on team A is
**27.75**. - The mean points scored by players on team B is
**16.00**.

Note that we can use the **ave()** function to calculate any summary statistic that we would like.

For example, we could use the following syntax to calculate the max points scored by players on each team:

#create new column to calculate median points by team df$max_points <- ave(df$points, df$team, FUN = max) #view updated data frame df team points assists max_points 1 A 22 7 34 2 A 25 6 34 3 A 30 6 34 4 A 34 4 34 5 B 19 8 19 6 B 14 10 19 7 B 13 12 19 8 B 18 11 19

Notice that the new column named **max_points** now contains the max number of points scored by players on each team.

For example, we can see:

- The max points scored by players on team A is
**34**. - The max points scored by players on team B is
**19**.

Feel free to specify any summary statistic that you would like in the **FUN** argument of the **ave()** function to calculate a different metric instead.

Common choices include the min, max, mean, median and standard deviation among other metrics.

**Additional Resources**

The following tutorials explain how to perform other common tasks in R:

How to Use str_split in R

How to Use str_replace in R

How to Count Words in String in R

How to Convert a Vector to String in R