How to Fill in Missing Dates in R


Often you may want to fill in missing dates in a column of a data frame in R.

This is common when you’re working with any type of time series data and you have inconsistent datetimes or simply missing values for some dates.

The easiest way to fill in missing dates is by using the complete() function from the tidyr package, which is designed for this exact task.

For example, you can use the following basic syntax to fill in missing dates for a column named date in a data frame:

tidyr::complete(df, date=seq(min(date), max(date), by="1 day"))

In this particular example, we use the complete() function to fill in missing dates in the date column of a data frame, specifying that the difference between each date period should be 1 day.

Note that you can use any datetime interval that you would like in the by argument of the function to instead use a different datetime interval.

The following example shows how to use this syntax in practice.

Note: Before using the complete() function, you may need to first install the tidyr package in R:

install.packages('tidyr')

Once the tidyr package has been installed, you will be able to use the complete() function to fill in missing dates in a data frame.

Example: How to Fill in Missing Dates in R

Suppose we create the following data frame named df that contains information about total sales made by some company on various dates:

#create data frame
df <- data.frame(date=c('1/10/2024', '1/11/2024', '1/12/2024', '1/14/2024'),
                 sales=c(100, 145, 187, 185))

#convert date column to date class
df$date <- as.POSIXct(df$date,format="%m/%d/%Y")

#view data frame
df

        date sales
1 2024-01-10   100
2 2024-01-11   145
3 2024-01-12   187
4 2024-01-14   185

The date column contains various dates and the sales column contains the total sales made on each date.

Notice that from the earliest date to the latest date, there is one day missing in the date column: 2024-01-13.

Suppose that we would like to fill in this missing date and simply have a value of NA in the corresponding element of the sales column to show that we’re missing sales date for that day.

The easiest way to do so is by using the complete() function from the tidyr package with the following syntax:

library(tidyr)

#fill in missing dates in 'date' column of date frame
tidyr::complete(df, date=seq(min(date), max(date), by="1 day"))

  date                sales
                
1 2024-01-10 00:00:00   100
2 2024-01-11 00:00:00   145
3 2024-01-12 00:00:00   187
4 2024-01-13 00:00:00    NA
5 2024-01-14 00:00:00   185

Notice that this returns a new data frame in which the date 2024-01-13 has been inserted as a new value in the date column and a value of NA has been inserted into the corresponding element of the sales column.

Note that we only filled in one missing date in this particular example but you can use similar syntax to fill in any number of missing dates in a column in R.

Note: You can find the complete documentation for the complete() function in tidyr here.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Check if Column is Date in R
How to Convert Datetime to Date in R
How to Generate a Sequence of Dates with lubridate in R
How to Convert Excel Date Format to Proper Date in R

Leave a Reply

Your email address will not be published. Required fields are marked *