You can use the **year** function from the lubridate package in R to quickly group data by year.

This function uses the following basic syntax:

library(tidyverse) df %>% group_by(year = lubridate::year(date_column)) %>% summarize(sum = sum(value_column))

The following example shows how to use this function in practice.

**Example: Group Data by Year in R**

Suppose we have the following data frame in R that shows the total sales of some item on various dates:

#create data frame df <- data.frame(date=as.Date(c('1/4/2021', '1/9/2021', '2/10/2022', '2/15/2022', '3/5/2022', '3/22/2023', '3/27/2023'), '%m/%d/%Y'), sales=c(8, 14, 22, 23, 16, 17, 23)) #view data frame df date sales 1 2021-01-04 8 2 2021-01-09 14 3 2022-02-10 22 4 2022-02-15 23 5 2022-03-05 16 6 2023-03-22 17 7 2023-03-27 23

We can use the following code to calculate the sum of sales, grouped by year:

**library(tidyverse)
#group data by year and sum sales
df %>%
group_by(year = lubridate::year(date)) %>%
summarize(sum_sales = sum(sales))
# A tibble: 3 x 2
year sum_sales
1 2021 22
2 2022 61
3 2023 40
**

From the output we can see:

- A total of
**22**sales were made in 2021. - A total of
**61**sales were made in 2022. - A total of
**40**sales were made in 2023.

We can also aggregate the data using some other metric.

For example, we could calculate the max sales made in one day, grouped by year:

**library(tidyverse)
#group data by year and find max sales
df %>%
group_by(year = lubridate::year(date)) %>%
summarize(max_sales = max(sales))
# A tibble: 3 x 2
year max_sales
1 2021 14
2 2022 23
3 2023 23
**

From the output we can see:

- The max sales made in one day in 2021 was
**14**. - The max sales made in one day in 2022 was
**23**. - The max sales made in one day in 2023 was
**23**.

Feel free to use whatever metric you’d like within the **summarize()** function.

