R: Use lubridate to Calculate Difference Between Dates

Often you may want to calculate the difference between two dates in R.

Fortunately this is easy to do by using the interval() function from the lubridate package in R, which is designed to perform this exact task.

The interval() function uses the following basic syntax:

interval(start = NULL, end = NULL)

where:

• start: The starting date
• end: The ending date

The following example shows how to use the interval() function from the lubridate package to calculate the difference between two dates in practice.

Note: Before using the interval() function, you may need to first install the lubridate package by using the following syntax:

`install.packages('lubridate')`

Once the lubridate package is installed, you can use the interval() function.

Example: How to Calculate Difference Between Two Dates in R

Suppose we create the following data frame named df that contains information about various employees at some company:

```#create data frame
df <- data.frame(start_date=c('2022-01-03', '2022-02-15', '2023-05-09',
'2023-08-10', '2024-10-14', '2024-12-30'),
end_date=c('2022-01-09', '2024-02-15', '2024-05-19',
'2024-03-10', '2024-12-14', '2025-11-30'),
sales=c(130, 98, 120, 88, 94, 100))

#view data frame
df

start_date   end_date sales
1 2022-01-03 2022-01-09   130
2 2022-02-15 2024-02-15    98
3 2023-05-09 2024-05-19   120
4 2023-08-10 2024-03-10    88
5 2024-10-14 2024-12-14    94
6 2024-12-30 2025-11-30   100
```

This data frame contains the following columns:

• start_date: The date that the employee first started working at the company
• end_date: The date that the employee stopped working at the company
• sales: The total sales made by the employee during their tenure

Suppose that we would like to calculate the difference between the start_date and end_date values for each employee in the data frame.

We can use the interval() function from the lubridate package to do so:

```library(lubridate)

#calculate difference between start and end dates
df\$tenure <- interval(ymd(df\$start_date), ymd(df\$end_date))

#convert interval to total number of whole years
df\$tenure = df\$tenure %/% years(1)

#view updated data frame
df

start_date   end_date sales tenure
1 2022-01-03 2022-01-09   130      0
2 2022-02-15 2024-02-15    98      2
3 2023-05-09 2024-05-19   120      1
4 2023-08-10 2024-03-10    88      0
5 2024-10-14 2024-12-14    94      0
6 2024-12-30 2025-11-30   100      0```

Notice that a new column has been added to the data frame named tenure that shows the total number of full years that each employee worked at the company.

For example:

• The first employee worked 0 full years at the company.
• The second employee worked 2 full years at the company.
• The third employee worked 1 full year at the company.

And so on.

Note that we divided the result of the interval() function by years(1) to specify that we wanted to display the date difference in terms of number of whole years.

We could instead divide by months(1) if we would like to display the date difference in terms of number of whole months:

```library(lubridate)

#calculate difference between start and end dates
df\$tenure <- interval(ymd(df\$start_date), ymd(df\$end_date))

#convert interval to total number of whole months
df\$tenure = df\$tenure %/% months(1)

#view updated data frame
df

start_date   end_date sales tenure
1 2022-01-03 2022-01-09   130      0
2 2022-02-15 2024-02-15    98     24
3 2023-05-09 2024-05-19   120     12
4 2023-08-10 2024-03-10    88      7
5 2024-10-14 2024-12-14    94      2
6 2024-12-30 2025-11-30   100     11```

The tenure column now shows the number of whole months that each employee worked at the company.

Feel free to display the date difference in terms of months, years or any other interval that you would like.

Note: You can find the complete documentation for the interval() function from the lubridate package here.