# How to Group Data by Year in R (With Example)

You can use the year function from the lubridate package in R to quickly group data by year.

This function uses the following basic syntax:

library(tidyverse)

df %>%
group_by(year = lubridate::year(date_column)) %>%
summarize(sum = sum(value_column))

The following example shows how to use this function in practice.

## Example: Group Data by Year in R

Suppose we have the following data frame in R that shows the total sales of some item on various dates:

#create data frame
df <- data.frame(date=as.Date(c('1/4/2021', '1/9/2021', '2/10/2022', '2/15/2022',
'3/5/2022', '3/22/2023', '3/27/2023'), '%m/%d/%Y'),
sales=c(8, 14, 22, 23, 16, 17, 23))

#view data frame
df

date sales
1 2021-01-04     8
2 2021-01-09    14
3 2022-02-10    22
4 2022-02-15    23
5 2022-03-05    16
6 2023-03-22    17
7 2023-03-27    23

We can use the following code to calculate the sum of sales, grouped by year:

library(tidyverse)

#group data by year and sum sales
df %>%
group_by(year = lubridate::year(date)) %>%
summarize(sum_sales = sum(sales))

# A tibble: 3 x 2
year sum_sales

1  2021        22
2  2022        61
3  2023        40

From the output we can see:

• A total of 22 sales were made in 2021.
• A total of 61 sales were made in 2022.
• A total of 40 sales were made in 2023.

We can also aggregate the data using some other metric.

For example, we could calculate the max sales made in one day, grouped by year:

library(tidyverse)

#group data by year and find max sales
df %>%
group_by(year = lubridate::year(date)) %>%
summarize(max_sales = max(sales))

# A tibble: 3 x 2
year max_sales

1  2021        14
2  2022        23
3  2023        23

From the output we can see:

• The max sales made in one day in 2021 was 14.
• The max sales made in one day in 2022 was 23.
• The max sales made in one day in 2023 was 23.

Feel free to use whatever metric you’d like within the summarize() function.