This tutorial explains a simple way to drop unused factor levels in a data frame in R.

**Dropping Unused Factor Levels**

Suppose we have the following data frame in R:

data <- data.frame(letters = letters[1:5], numbers = c(1, 2, 4, 7, 8)) data # letters numbers #1 a 1 #2 b 2 #3 c 4 #4 d 7 #5 e 8 #view factor levels of columnletterslevels(data$letters) [1] "a" "b" "c" "d" "e"

We can see that *letters *is a factor variable with five levels. Now suppose we want to take a subset of this data frame that only includes the rows where the value for *numbers *is less than 5:

sub_data <- subset(data, numbers < 5) sub_data # letters numbers #1 a 1 #2 b 2 #3 c 4 #view factor levels of columnletterslevels(sub_data$letters) #[1] "a" "b" "c" "d" "e"

Despite the subsetted data frame only containing the factor levels “a”, “b”, and “c”, our subsetted data frame is still saying that the factor levels are “a”, “b”, “c”, “d”, and “e”.

In order to drop the unused factor levels “d” and “e”, we can use the following simple line of code:

#drop unused factor levels sub_data$letters <- droplevels(sub_data$letters) #view factor levels levels(sub_data$letters) #[1] "a" "b" "c"

We can see that the function **droplevels()** allowed us to drop the unused factor levels in our subsetted data frame.