How to Efficiently Convert a Factor to Numeric or Integer in R

How to convert a factor to numeric or integer in R

Suppose we have the following vector that is of the class factor and has 10 elements:

#make this example reproducible
set.seed(0)

#create factor vector with 10 elements
data <- factor(sample(c(.15, .30, .45), 10, replace = TRUE))

#view vector
data

# [1] 0.45 0.15 0.3  0.3  0.45 0.15 0.45 0.45 0.3  0.3 
#Levels: 0.15 0.3 0.45

To convert this vector into a numeric vector we may try to use as.numeric(), but this simply returns the factor levels:

as.numeric(data)

# [1] 3 1 2 2 3 1 3 3 2 2

We run into the same problem when we attempt to convert this vector into an integer vector. Instead of getting the integer values, we instead get the factor levels once again:

as.integer(data)

# [1] 3 1 2 2 3 1 3 3 2 2

It turns out there are two simple solutions to efficiently convert a factor to a numeric/integer in R:

 1. Use as.numeric(as.character(x))

The first way to convert a factor to a numeric or integer is to first convert it to a character:

#convert factor to numeric
as.numeric(as.character(data))

#[1] 0.45 0.15 0.30 0.30 0.45 0.15 0.45 0.45 0.30 0.30

#convert factor to integer
as.integer(as.character(data)) 

#[1] 0 0 0 0 0 0 0 0 0 0

 2. Use as.numeric(levels(x))[x]

The second way to convert a factor to a numeric or integer is to use levels():

#convert factor to numeric 
as.numeric(levels(data))[data]

#[1] 0.45 0.15 0.30 0.30 0.45 0.15 0.45 0.45 0.30 0.30

#convert factor to integer
as.integer(levels(data))[data]

# [1] 0 0 0 0 0 0 0 0 0 0

Which Method is Fastest?

It turns out that the larger the vector, the more efficient it is to use the as.numeric(levels(data))[data] approach.

For example, suppose we have a vector of length 500. Using the microbenchmark library, we can see that as.numeric(levels(data))[data] is much quicker than as.numeric(as.character(data)):

#define vector of length 500
data <- factor(sample(c(.15, .30, .45), 500, replace = TRUE))

#time how long it takes for each factor to numeric approach
library(microbenchmark)
microbenchmark(
  as.numeric(levels(data))[data],
  as.numeric(as.character(data))
)

#Unit: microseconds
#                           expr    min      lq     mean  median      uq     max
# as.numeric(levels(data))[data]  4.594  5.2075  6.85039  6.2345  7.4180  37.877
# as.numeric(as.character(data)) 25.978 26.7740 31.11130 28.4880 30.0325 139.076

The technical reason that the as.numeric(levels(data))[data] approach is faster is because as.character(data) requires a “primitive lookup” to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].

Factor to Numeric/Integer Conversions Don’t Always Make Sense

Keep in mind that factor to numeric/integer conversions only make sense if the values of the factor vector are numeric/integer in nature.

For example, suppose we have the following factor vector that contains only the letters a, b, and c:

#make this example reproducible
set.seed(0)

#define factor vector that contains only letters
data <- factor(sample(c('a', 'b', 'c'), 10, replace = TRUE))

#view vector
data

#[1] c a b b c a c c b b
#Levels: a b c

#attempt to convert factor to numeric
as.numeric(levels(data))[data]

#[1] NA NA NA NA NA NA NA NA NA NA
#Warning message:
#NAs introduced by coercion 

All of the elements in the vector are simply converted to NA.

Leave a Reply

Your email address will not be published. Required fields are marked *