Often in machine learning, we want to convert categorical variables into some type of numeric format that can be readily used by algorithms.

One way to do this is through **label encoding**, which assigns each categorical value an integer value based on alphabetical order.

For example, the following screenshot shows how to convert each unique value in a categorical variable called **Team** into an integer value based on alphabetical order:

There are two common ways to perform label encoding in R:

**Method 1: Use Base R**

df$my_var <- as.numeric(factor(df$my_var))

**Method 2: Use CatEncoders Package**

library(CatEncoders) #define original categorical labels labs = LabelEncoder.fit(df$my_var) #convert labels to numeric values df$team = transform(labs, df$my_var)

The following examples show how to use each method in practice.

**Example 1: Label Encoding Using Base R**

The following code shows how to use the **factor()** function from base R to convert a categorical variable called **team** into a numeric variable:

#create data frame df <- data.frame(team=c('A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'), points=c(25, 12, 15, 14, 19, 23, 25, 29)) #view data frame df team points 1 A 25 2 A 12 3 B 15 4 B 14 5 B 19 6 B 23 7 C 25 8 C 29 #perform label encoding on team variable df$team <- as.numeric(factor(df$team)) #view updated data frame df team points 1 1 25 2 1 12 3 2 15 4 2 14 5 2 19 6 2 23 7 3 25 8 3 29

Notice the new values in the **team** column:

- “A” has become
**1**. - “B” has become
**2**. - “C” has become
**3**.

We have successfully converted the **team** column from a categorical variable into a numeric variable.

**Example 2: Label Encoding Using CatEncoders Package**

The following code shows how to use functions from the **CatEncoders()** package to convert a categorical variable called **team** into a numeric variable:

library(CatEncoders) #create data frame df <- data.frame(team=c('A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'), points=c(25, 12, 15, 14, 19, 23, 25, 29)) #define original categorical labels labs = LabelEncoder.fit(df$team) #convert labels to numeric values df$team = transform(labs, df$team) #view updated data frame df team points 1 1 25 2 1 12 3 2 15 4 2 14 5 2 19 6 2 23 7 3 25 8 3 29

Once again, we have generated the following new values in the **team** column:

- “A” has become
**1**. - “B” has become
**2**. - “C” has become
**3**.

This matches the results from the previous example.

Note that using this method, you can also use **inverse.transform()** to obtain the original values from the **team** column:

#display original team labels inverse.transform(labs, df$team) [1] "A" "A" "B" "B" "B" "B" "C" "C"

