How to Drop Unused Factor Levels in a Subsetted Data Frame in R

How to drop unused factor levels in a subsetted data frame in R

This tutorial explains a simple way to drop unused factor levels in a data frame in R.

Dropping Unused Factor Levels

Suppose we have the following data frame in R:

data <- data.frame(letters = letters[1:5],
                   numbers = c(1, 2, 4, 7, 8))

data

#  letters numbers
#1       a       1
#2       b       2
#3       c       4
#4       d       7
#5       e       8

#view factor levels of column letters
levels(data$letters)

[1] "a" "b" "c" "d" "e"

We can see that letters is a factor variable with five levels. Now suppose we  want to take a subset of this data frame that only includes the rows where the value for numbers is less than 5:

sub_data <- subset(data, numbers < 5)
sub_data

#  letters numbers
#1       a       1
#2       b       2
#3       c       4

#view factor levels of column letters
levels(sub_data$letters)

#[1] "a" "b" "c" "d" "e"

Despite the subsetted data frame only containing the factor levels “a”, “b”, and “c”, our subsetted data frame is still saying that the factor levels are “a”, “b”, “c”, “d”, and “e”. 

In order to drop the unused factor levels “d” and “e”, we can use the following simple line of code:

#drop unused factor levels 
sub_data$letters <- droplevels(sub_data$letters)

#view factor levels
levels(sub_data$letters)

#[1] "a" "b" "c"

We can see that the function droplevels() allowed us to drop the unused factor levels in our subsetted data frame. 

Leave a Reply

Your email address will not be published. Required fields are marked *