Pandas: Drop Columns if Name Contains Specific String


You can use the following methods to drop columns from a pandas DataFrame whose name contains specific strings:

Method 1: Drop Columns if Name Contains Specific String

df.drop(list(df.filter(regex='this_string')), axis=1, inplace=True)

Method 2: Drop Columns if Name Contains One of Several Specific Strings

df.drop(list(df.filter(regex='string1|string2|string3')), axis=1, inplace=True)

The following examples show how to use each method in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team_name': ['A', 'B', 'C', 'D', 'E', 'F'],
                   'team_location': ['AU', 'AU', 'EU', 'EU', 'AU', 'EU'],
                   'player_name': ['Andy', 'Bob', 'Chad', 'Dan', 'Ed', 'Fran'],
                   'points': [22, 29, 35, 30, 18, 12]})

#view DataFrame
print(df)

  team_name team_location player_name  points
0         A            AU        Andy      22
1         B            AU         Bob      29
2         C            EU        Chad      35
3         D            EU         Dan      30
4         E            AU          Ed      18
5         F            EU        Fran      12

Example 1: Drop Columns if Name Contains Specific String

We can use the following syntax to drop all columns in the DataFrame that contain ‘team’ anywhere in the column name:

#drop columns whose name contains 'team'
df.drop(list(df.filter(regex='team')), axis=1, inplace=True)

#view updated DataFrame
print(df)

  player_name  points
0        Andy      22
1         Bob      29
2        Chad      35
3         Dan      30
4          Ed      18
5        Fran      12

Notice that both columns that contained ‘team’ in the name have been dropped from the DataFrame. 

Example 2: Drop Columns if Name Contains One of Several Specific Strings

We can use the following syntax to drop all columns in the DataFrame that contain ‘player’ or ‘points’ anywhere in the column name:

#drop columns whose name contains 'player' or 'points'
df.drop(list(df.filter(regex='player|points')), axis=1, inplace=True)

#view updated DataFrame
print(df)

  team_name team_location
0         A            AU
1         B            AU
2         C            EU
3         D            EU
4         E            AU
5         F            EU

Notice that both columns that contained either ‘player’ or ‘points’ in the name have been dropped from the DataFrame.

Note: The | symbol in pandas is used as an “OR” operator.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Drop First Column in Pandas
How to Drop Duplicate Columns in Pandas
How to Drop All Columns Except Specific Ones in Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *