Pandas: How to Drop Rows that Contain a Specific String


You can use the following syntax to drop rows that contain a certain string in a pandas DataFrame:

df[df["col"].str.contains("this string")==False]

This tutorial explains several examples of how to use this syntax in practice with the following DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'],
                   'conference': ['East', 'East', 'East', 'West', 'West', 'East'],
                   'points': [11, 8, 10, 6, 6, 5]})

#view DataFrame
df

        team	conference   points
0	A	East         11
1	A	East	     8
2	A	East	     10
3	B	West         6
4	B	West         6
5	C	East         5

Example 1: Drop Rows that Contain a Specific String

The following code shows how to drop all rows in the DataFrame that contain ‘A’ in the team column:

df[df["team"].str.contains("A")==False]

        team	conference  points
3	B	West	    6
4	B	West	    6
5	C	East	    5

Example 2: Drop Rows that Contain a String in a List

The following code shows how to drop all rows in the DataFrame that contain ‘A’ or ‘B’ in the team column:

df[df["team"].str.contains("A|B")==False]

	team	conference   points
5	C	East	     5

Example 3: Drop Rows that Contain a Partial String

In the previous examples, we dropped rows based on rows that exactly matched one or more strings.

However, if we’d like to drop rows that contain a partial string then we can use the following syntax:

#identify partial string to look for
discard = ["Wes"]

#drop rows that contain the partial string "Wes" in the conference column
df[~df.conference.str.contains('|'.join(discard))]

team	conference	points
0	A	East	11
1	A	East	8
2	A	East	10
5	C	East	5

You can find more pandas tutorials on this page.

Leave a Reply

Your email address will not be published.