Pandas: How to Create Column If It Doesn’t Exist


You can use the following basic syntax to create a column in a pandas DataFrame if it doesn’t already exist:

df['my_column'] = df.get('my_column', df['col1'] * df['col2']) 

This particular syntax creates a new column called my_column if it doesn’t already exist in the DataFrame and it is defined as the product of the existing columns col1 and col2.

The following example shows how to use this syntax in practice.

Example: Create Column in Pandas If It Doesn’t Exist

Suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'day': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   'sales': [4, 6, 5, 8, 14, 13, 13, 12, 9, 8, 19, 14],
                   'price': [1, 2, 2, 1, 2, 4, 4, 3, 3, 2, 2, 3]})

#view DataFrame
print(df)

    day  sales  price
0     1      4      1
1     2      6      2
2     3      5      2
3     4      8      1
4     5     14      2
5     6     13      4
6     7     13      4
7     8     12      3
8     9      9      3
9    10      8      2
10   11     19      2
11   12     14      3

Now suppose we attempt to add a column called price if it doesn’t already exist and define it as a column in which each value is equal to 100:

#attempt to add column called 'price'
df['price'] = df.get('price', 100)    

#view updated DataFrame
print(df)

    day  sales  price
0     1      4      1
1     2      6      2
2     3      5      2
3     4      8      1
4     5     14      2
5     6     13      4
6     7     13      4
7     8     12      3
8     9      9      3
9    10      8      2
10   11     19      2
11   12     14      3

Since a column called price already exists, pandas simply doesn’t add it to the DataFrame.

However, suppose we attempt to add a new column called revenue if it doesn’t already exist and define it as a column in which the values are the product of the sales and price columns:

#attempt to add column called 'revenue'
df['revenue'] = df.get('revenue', df['sales'] * df['price'])

#view updated DataFrame
print(df)

    day  sales  price  revenue
0     1      4      1        4
1     2      6      2       12
2     3      5      2       10
3     4      8      1        8
4     5     14      2       28
5     6     13      4       52
6     7     13      4       52
7     8     12      3       36
8     9      9      3       27
9    10      8      2       16
10   11     19      2       38
11   12     14      3       42

This revenue column is added to the DataFrame because it did not already exist.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Drop Rows in Pandas DataFrame Based on Condition
How to Filter a Pandas DataFrame on Multiple Conditions
How to Use “NOT IN” Filter in Pandas DataFrame

Leave a Reply

Your email address will not be published.