Pandas: How to Select Columns Based on Partial Match


You can use the following methods to select columns in a pandas DataFrame based on partial matching:

Method 1: Select Columns Based on One Partial Match

#select columns that contain 'team'
df.loc[:, df.columns.str.contains('team')]

Method 2: Select Columns Based on Multiple Partial Matches

#select columns that contain 'team' or 'rebounds'
df.loc[:, df.columns.str.contains('team|rebounds')] 

The following examples show how to use each method with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team_name': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'team_points': [5, 7, 7, 9, 12, 9, 9, 4],
                   'assists': [11, 8, 10, 6, 6, 5, 9, 12],
                   'rebounds': [6, 7, 7, 6, 10, 12, 10, 9]})

#view DataFrame
print(df)

  team_name  team_points  assists  rebounds
0         A            5       11         6
1         A            7        8         7
2         A            7       10         7
3         A            9        6         6
4         B           12        6        10
5         B            9        5        12
6         B            9        9        10
7         B            4       12         9

Example 1: Select Columns Based on One Partial Match

The following code shows how to select all columns in the pandas DataFrame that contain ‘team’ in the column name:

#select columns that contain 'team'
df_team_cols = df.loc[:, df.columns.str.contains('team')]

#view results
print(df_team_cols)

  team_name  team_points
0         A            5
1         A            7
2         A            7
3         A            9
4         B           12
5         B            9
6         B            9
7         B            4

Notice that both columns that contain ‘team’ in the name are returned.

Example 2: Select Columns Based on Multiple Partial Matches

The following code shows how to select all columns in the pandas DataFrame that contain ‘team’ or ‘rebounds’ in the column name:

#select columns that contain 'team' or 'rebounds'
df_team_rebs = df.loc[:, df.columns.str.contains('team|rebounds')]

#view results
print(df_team_rebs)

  team_name  team_points  rebounds
0         A            5         6
1         A            7         7
2         A            7         7
3         A            9         6
4         B           12        10
5         B            9        12
6         B            9        10
7         B            4         9

All columns that contain either ‘team’ or ‘rebounds’ in the name are returned.

Note: The | operator represents “OR” in pandas.

Feel free to use as many of these operators as you’d like to search for as many partial string matches as you’d like.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Select Columns by Name in Pandas
How to Select Columns by Index in Pandas
How to Select Columns by Data Type in Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *