Pandas: How to Select Columns Based on Condition


You can use the following methods to select columns in a pandas DataFrame by condition:

Method 1: Select Columns Where At Least One Row Meets Condition

#select columns where at least one row has a value greater than 2
df.loc[:, (df > 2).any()]

Method 2: Select Columns Where All Rows Meet Condition

#select columns where all rows have a value greater than 2
df.loc[:, (df > 2).all()] 

Method 3: Select Columns Where At Least One Row Meets Multiple Conditions

#select columns where at least one row has a value between 10 and 15
df.loc[:, ((df>=10) & (df<=15)).any()]

The following examples show how to use each method with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'apples': [7, 3, 3, 4, 3],
                   'oranges': [2, 0, 2, 0, 1],
                   'bananas': [5, 0, 4, 0, 12]},
                    index=['Farm1', 'Farm2', 'Farm3', 'Farm4', 'Farm5'])

#view DataFrame
print(df)

       apples  oranges  bananas
Farm1       7        2        5
Farm2       3        0        0
Farm3       3        2        4
Farm4       4        0        0
Farm5       3        1       12

Example 1: Select Columns Where At Least One Row Meets Condition

We can use the following code to select the columns in the DataFrame where at least one row in the column has a value greater than 2:

#select columns where at least one row has a value greater than 2
df.loc[:, (df > 2).any()]

	apples	bananas
Farm1	7	5
Farm2	3	0
Farm3	3	4
Farm4	0	0
Farm5	3	12

Notice that the apples and bananas columns are returned because both of these columns have at least one row with a value greater than 2.

Example 2: Select Columns Where All Rows Meet Condition

We can use the following code to select the columns in the DataFrame where every row in the column has a value greater than 2:

#select columns where every row has a value greater than 2
df.loc[:, (df > 2).all()]

	apples
Farm1	7
Farm2	3
Farm3	3
Farm4	4
Farm5	3

Notice that only the apples column is returned because it is the only column where every row in the column has a value greater than 2.

Example 3: Select Columns Where At Least One Row Meets Multiple Conditions

We can use the following code to select the columns in the DataFrame where at least one row in the column has a value between 10 and 15:

#select columns where every row has a value greater than 2
df.loc[:, ((df>=10) & (df<=15)).any()]

	bananas
Farm1	5
Farm2	0
Farm3	4
Farm4	0
Farm5	12

Notice that only the bananas column is returned because it is the only column where at least one row in the column has a value between 10 and 15.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Select Columns by Name in Pandas
How to Select Columns by Index in Pandas
How to Select Columns Containing a Specific String in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *