Suppose we have the following vector that is of the class **factor **and has 10 elements:

#make this example reproducible set.seed(0) #create factor vector with 10 elements data <- factor(sample(c(.15, .30, .45), 10, replace = TRUE)) #view vector data # [1] 0.45 0.15 0.3 0.3 0.45 0.15 0.45 0.45 0.3 0.3 #Levels: 0.15 0.3 0.45

To convert this vector into a numeric vector we may try to use **as.numeric()**, but this simply returns the factor levels:

as.numeric(data) # [1] 3 1 2 2 3 1 3 3 2 2

We run into the same problem when we attempt to convert this vector into an integer vector. Instead of getting the integer values, we instead get the factor levels once again:

as.integer(data) # [1] 3 1 2 2 3 1 3 3 2 2

It turns out there are two simple solutions to efficiently convert a factor to a numeric/integer in R:

** 1. Use as.numeric(as.character(x))**

The first way to convert a factor to a numeric or integer is to first convert it to a character:

#convert factor to numeric as.numeric(as.character(data)) #[1] 0.45 0.15 0.30 0.30 0.45 0.15 0.45 0.45 0.30 0.30 #convert factor to integer as.integer(as.character(data)) #[1] 0 0 0 0 0 0 0 0 0 0

** 2. Use as.numeric(levels(x))[x]**

The second way to convert a factor to a numeric or integer is to use levels():

#convert factor to numeric as.numeric(levels(data))[data] #[1] 0.45 0.15 0.30 0.30 0.45 0.15 0.45 0.45 0.30 0.30 #convert factor to integer as.integer(levels(data))[data] # [1] 0 0 0 0 0 0 0 0 0 0

**Which Method is Fastest?**

It turns out that the larger the vector, the more efficient it is to use the **as.numeric(levels(data))[data]** approach.

For example, suppose we have a vector of length 500. Using the **microbenchmark** library, we can see that **as.numeric(levels(data))[data] **is much quicker than as.numeric(as.character(data)):

#define vector of length 500 data <- factor(sample(c(.15, .30, .45), 500, replace = TRUE)) #time how long it takes for eachfactor to numericapproach library(microbenchmark) microbenchmark( as.numeric(levels(data))[data], as.numeric(as.character(data)) ) #Unit: microseconds # expr min lq mean median uq max # as.numeric(levels(data))[data] 4.594 5.2075 6.85039 6.2345 7.4180 37.877 # as.numeric(as.character(data)) 25.978 26.7740 31.11130 28.4880 30.0325 139.076

The technical reason that the **as.numeric(levels(data))[data]** approach is faster is because **as.character(data)** requires a “primitive lookup” to find the function **as.character.factor()**, which is defined as **as.numeric(levels(f))[f]**.

**Factor to Numeric/Integer Conversions Don’t Always Make Sense**

Keep in mind that factor to numeric/integer conversions only make sense if the values of the factor vector are numeric/integer in nature.

For example, suppose we have the following factor vector that contains only the letters *a*, *b*, and *c*:

#make this example reproducible set.seed(0) #define factor vector that contains only letters data <- factor(sample(c('a', 'b', 'c'), 10, replace = TRUE)) #view vector data #[1] c a b b c a c c b b #Levels: a b c #attempt to convert factor to numeric as.numeric(levels(data))[data] #[1] NA NA NA NA NA NA NA NA NA NA #Warning message: #NAs introduced by coercion

All of the elements in the vector are simply converted to *NA*.