How to Use the fill() Function in R


Often you may want to fill in missing values in a data frame in R with the previous value or the next available value.

One of the easiest ways to do so is by using the fill() function from the tidyr package in R.

This function uses the following basic syntax:

fill(data, …, .direction = c(“down”, “up”, “downup”, “updown”))

where:

  • data: The name of a data frame
  • : Columns to fill missing values in
  • .direction: Direction in which to fill missing values. Options include “down” (the default), “up”, “downup” (i.e. first down and then up) or “updown” (first up and then down).

Note that the default behavior of the fill() function is to fill in missing values in a “down” manner, meaning the value that appears directly above the missing value will be used to fill in the missing value.

Feel free to use the .direction argument to specify another method to use for filling in missing values if you would like.

Note: You may first need to install the tidyr package in R before you can use the fill() function. You can use the following syntax to do so:

install.packages('tidyr')

Once the tidyr package is successfully installed, you will be able to use the fill() function without encountering any errors.

Example: How to Use the fill() Function in R

Suppose we create the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(99, 68, 86, 88, 95, 74, 78, 93),
                 assists=c(22, NA, 31, 35, 34, NA, 28, 31),
                 rebounds=c(30, NA, NA, 24, 30, 36, 30, 29))

#view data frame
df

  team points assists rebounds
1    A     99      22       30
2    A     68      NA       NA
3    A     86      31       NA
4    A     88      35       24
5    B     95      34       30
6    B     74      NA       36
7    B     78      28       30
8    B     93      31       29

The data frame contains the following columns:

  • team: The name of the team each player belongs on
  • points: The total points scored by the player
  • assists: The total assists made by the player
  • rebounds: The total rebounds by the player

Notice that several of the columns in the data frame have missing values (NA) in various locations.

Suppose that we would like to fill in each of the missing values in the assists column of the data frame with the most recently available value above it in each column.

We can use the fill() function from the tidyr package to do so:

library(tidyr)

#fill in missing values in assists columns
df %>% fill(assists)

  team points assists rebounds
1    A     99      22       30
2    A     68      22       NA
3    A     86      31       NA
4    A     88      35       24
5    B     95      34       30
6    B     74      34       36
7    B     78      28       30
8    B     93      31       29

Notice that each of the missing values in the assists column of the data frame have been filled in with the most recently available value above it.

Note that we could also specify to use “up” as the fill direction instead:

library(tidyr)

#fill in missing values in assists columns
df %>% fill(assists, .direction="up")

  team points assists rebounds
1    A     99      22       30
2    A     68      31       NA
3    A     86      31       NA
4    A     88      35       24
5    B     95      34       30
6    B     74      28       36
7    B     78      28       30
8    B     93      31       29

Notice that each of the missing values in the assists column of the data frame have been filled in with the most recently available value below it.

Note: You can find the complete documentation for the fill() function from the tidyr package here.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Use slice_max() in dplyr
How to Rename Columns Using dplyr
How to Add Row to Data Frame Using dplyr
How to Use the pull() Function in dplyr

Leave a Reply

Your email address will not be published. Required fields are marked *