How to Perform Label Encoding in R (With Examples)


Often in machine learning, we want to convert categorical variables into some type of numeric format that can be readily used by algorithms.

One way to do this is through label encoding, which assigns each categorical value an integer value based on alphabetical order.

For example, the following screenshot shows how to convert each unique value in a categorical variable called Team into an integer value based on alphabetical order:

There are two common ways to perform label encoding in R:

Method 1: Use Base R

df$my_var <- as.numeric(factor(df$my_var))

Method 2: Use CatEncoders Package

library(CatEncoders)

#define original categorical labels
labs = LabelEncoder.fit(df$my_var)

#convert labels to numeric values
df$team = transform(labs, df$my_var)

The following examples show how to use each method in practice.

Example 1: Label Encoding Using Base R

The following code shows how to use the factor() function from base R to convert a categorical variable called team into a numeric variable:

#create data frame
df <- data.frame(team=c('A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'),
                 points=c(25, 12, 15, 14, 19, 23, 25, 29))

#view data frame
df

  team points
1    A     25
2    A     12
3    B     15
4    B     14
5    B     19
6    B     23
7    C     25
8    C     29

#perform label encoding on team variable
df$team <- as.numeric(factor(df$team))

#view updated data frame
df

  team points
1    1     25
2    1     12
3    2     15
4    2     14
5    2     19
6    2     23
7    3     25
8    3     29

Notice the new values in the team column:

  • “A” has become 1.
  • “B” has become 2.
  • “C” has become 3.

We have successfully converted the team column from a categorical variable into a numeric variable.

Example 2: Label Encoding Using CatEncoders Package

The following code shows how to use functions from the CatEncoders() package to convert a categorical variable called team into a numeric variable:

library(CatEncoders)

#create data frame
df <- data.frame(team=c('A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'),
                 points=c(25, 12, 15, 14, 19, 23, 25, 29))

#define original categorical labels
labs = LabelEncoder.fit(df$team)

#convert labels to numeric values
df$team = transform(labs, df$team)

#view updated data frame
df

  team points
1    1     25
2    1     12
3    2     15
4    2     14
5    2     19
6    2     23
7    3     25
8    3     29

Once again, we have generated the following new values in the team column:

  • “A” has become 1.
  • “B” has become 2.
  • “C” has become 3.

This matches the results from the previous example.

Note that using this method, you can also use inverse.transform() to obtain the original values from the team column:

#display original team labels
inverse.transform(labs, df$team)

[1] "A" "A" "B" "B" "B" "B" "C" "C"

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Create Categorical Variable from Continuous in R
How to Create Categorical Variables in R
How to Convert Categorical Variables to Numeric in R

Leave a Reply

Your email address will not be published.