How to Use the na.locf() Function in R


Often you may want to replace NA values in a data frame in R with the most recently available prior value.

One of the best ways to do so is by using the na.locf() function from the zoo package in R, which can be used to perform this exact task.

The na.locf() function uses the following syntax:

na.locf(object, na.rm=TRUE, fromLast, …)

where:

  • object: Name of data frame
  • na.rm: Whether leading NA values should be removed or not
  • fromLast: Whether observations should be carried backward rather than forward

The following example shows how to use the na.locf() function in practice in R.

Example: How to Use the na.locf() Function in R

Suppose that we create a data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(points=c(8, NA, 14,NA, 13, 28, 20, 24, 28, 30, 34, 40),
                 assists=c(3, 8, 8, 6, 10, 14, 8, 17, 13, 9, 10, 11),
                 rebounds=c(10, 8, NA, NA, 9, 5, 8, 6, 5, 4, 3, 3),
                 steals=c(2, 4, 4, 5, 3, 6, 7, 5, 7, 7, 9, 12))

#view data frame
df

   points assists rebounds steals
1       8       3       10      2
2      NA       8        8      4
3      14       8       NA      4
4      NA       6       NA      5
5      13      10        9      3
6      28      14        5      6
7      20       8        8      7
8      24      17        6      5
9      28      13        5      7
10     30       9        4      7
11     34      10        3      9
12     40      11        3     12

Notice that several of the columns contain NA values, which represent missing values.

Suppose that we would like to replace each of the NA values in each column with the most recently available prior value.

We can use the na.locf() function from the zoo package to do so:

library(zoo)

#replace NA values in each column with most recently available value
na.locf(df)

   points assists rebounds steals
1       8       3       10      2
2       8       8        8      4
3      14       8        8      4
4      14       6        8      5
5      13      10        9      3
6      28      14        5      6
7      20       8        8      7
8      24      17        6      5
9      28      13        5      7
10     30       9        4      7
11     34      10        3      9
12     40      11        3     12

Notice that the NA values in each column have been replaced with the most recently available value in the same column.

By default, the NA values in each column have been replaced. However, we can specify that we would like to replace the NA values in only one column if we’d like.

For example, we can use the following syntax with the na.locf() function to only replace missing values in the points column:

library(zoo)

#replace NA values in points column with most recently available value
df$points <- na.locf(df$points)

#view updated data frame
df

   points assists rebounds steals
1       8       3       10      2
2       8       8        8      4
3      14       8       NA      4
4      14       6       NA      5
5      13      10        9      3
6      28      14        5      6
7      20       8        8      7
8      24      17        6      5
9      28      13        5      7
10     30       9        4      7
11     34      10        3      9
12     40      11        3     12

Notice that only the NA values in the points column have been replaced.

Note that in each of these examples so far we have replaced NA values by using the most recently available value that occurs before it.

However, we can use the fromLast argument within the na.locf() function to instead fill in NA values by using the most recently available value that occurs after it in the column.

The following syntax shows how to do so:

library(zoo)

#replace NA values in each column with most recently available value after it
na.locf(df, fromLast=TRUE)

   points assists rebounds steals
1       8       3       10      2
2      14       8        8      4
3      14       8        9      4
4      13       6        9      5
5      13      10        9      3
6      28      14        5      6
7      20       8        8      7
8      24      17        6      5
9      28      13        5      7
10     30       9        4      7
11     34      10        3      9
12     40      11        3     12

Notice that the NA values in each column have been replaced with the most recently available value after it in the same column.

Depending on how you’d like to fill in NA values, you may choose to the use the fromLast argument or not.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Sort a Table in R
How to Plot a Table in R
How to Create a Three-Way Table in R
How to Create a Frequency Table by Group in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *