How to Perform One-Hot Encoding in R


One-hot encoding is used to convert categorical variables into a format that can be used by machine learning algorithms.

The basic idea of one-hot encoding is to create new variables that take on values 0 and 1 to represent the original categorical values.

For example, the following image shows how we would perform one-hot encoding to convert a categorical variable that contains team names into new variables that contain only 0 and 1 values:

The following step-by-step example shows how to perform one-hot encoding for this exact dataset in R.

Step 1: Create the Data

First, let’s create the following data frame in R:

#create data frame
df <- data.frame(team=c('A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'),
                 points=c(25, 12, 15, 14, 19, 23, 25, 29))

#view data frame
df

  team points
1    A     25
2    A     12
3    B     15
4    B     14
5    B     19
6    B     23
7    C     25
8    C     29

Step 2: Perform One-Hot Encoding

Next, let’s use the dummyVars() function from the caret package to perform one-hot encoding on the ‘team’ variable in the data frame:

library(caret)

#define one-hot encoding function
dummy <- dummyVars(" ~ .", data=df)

#perform one-hot encoding on data frame
final_df <- data.frame(predict(dummy, newdata=df))

#view final data frame
final_df

  teamA teamB teamC points
1     1     0     0     25
2     1     0     0     12
3     0     1     0     15
4     0     1     0     14
5     0     1     0     19
6     0     1     0     23
7     0     0     1     25
8     0     0     1     29 

Notice that three new columns were added to the data frame since the original ‘team’ column contained three unique values.

Also notice that the original ‘team’ column was dropped from the data frame since it’s no longer needed.

The one-hot encoding is complete and we can now feed this dataset into any machine learning algorithm that we’d like.

Note: You can find the complete online documentation for the dummyVars() function here.

Additional Resources

The following tutorials offer additional information about working with categorical variables:

How to Create Categorical Variables in R
How to Plot Categorical Data in R
Categorical vs. Quantitative Variables: What’s the Difference?

Leave a Reply

Your email address will not be published.