You can use one of the following methods to create a categorical variable in pandas:

**Method 1: Create Categorical Variable from Scratch**

df['cat_variable'] = ['A', 'B', 'C', 'D']

** Method 2: Create Categorical Variable from Existing Numerical Variable**

df['cat_variable'] = pd.cut(df['numeric_variable'], bins=[0, 15, 25, float('Inf')], labels=['Bad', 'OK', 'Good'])

The following examples show how to use each method in practice.

**Example 1: Create Categorical Variable from Scratch**

The following code shows how to create a pandas DataFrame with one categorical variable called **team** and one numerical variable called **points**:

import pandas as pd #create DataFrame with one categorical variable and one numeric variable df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], 'points': [12, 15, 19, 22, 24, 25, 26, 30]}) #view DataFrame print(df) team points 0 A 12 1 B 15 2 C 19 3 D 22 4 E 24 5 F 25 6 G 26 7 H 30 #view data type of each column in DataFrame print(df.dtypes) team object points int64 dtype: object

By using **df.dtypes**, we can see the data type of each variable in the DataFrame.

We can see:

- The variable team is an
**object**. - The variable points is an
**integer**.

In Python, an **object** is equivalent to a character or “categorical” variable. Thus, the team variable is a categorical variable.

**Example 2: Create Categorical Variable from Existing Numerical Variable**

The following code shows how to create a categorical variable called **status** from the existing numerical variable called **points** in the DataFrame:

import pandas as pd #create DataFrame with one categorical variable and one numeric variable df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], 'points': [12, 15, 19, 22, 24, 25, 26, 30]}) #create categorical variable 'status' based on existing numerical 'points' variable df['status'] = pd.cut(df['points'], bins=[0, 15, 25, float('Inf')], labels=['Bad', 'OK', 'Good']) #view updated DataFrame print(df) team points status 0 A 12 Bad 1 B 15 Bad 2 C 19 OK 3 D 22 OK 4 E 24 OK 5 F 25 OK 6 G 26 Good 7 H 30 Good

Using the **cut()** function, we created a new categorical variable called **status** that takes the following values:

- ‘
**Bad**‘ if the value in the points column is less than or equal to 15. - Else, ‘
**OK**‘ if the value in the points column is less than or equal to 25. - Else, ‘
**Good**‘.

Note that when using the **cut()** function, the number of **labels** must be one less than the number of **bins**.

In our example, we used four values for **bins** to define the bin edges and three values for **labels** to specify the labels to use for the categorical variable.

**Additional Resources**

The following tutorials explain how to perform other common tasks in pandas:

How to Create Dummy Variables in Pandas

How to Convert Categorical Variable to Numeric in Pandas

How to Convert Boolean Values to Integer Values in Pandas