Pandas: How to Use dropna() with thresh


You can use the dropna() function to drops rows from a pandas DataFrame that contain missing values.

You can also use the thresh argument to specify the minimum number of non-NaN values that a row or column must have in order to be kept in the DataFrame.

Here are the most common ways to use the thresh argument in practice:

Method 1: Only Keep Rows with Minimum Number of non-NaN Values

#only keep rows with at least 2 non-NaN values
df.dropna(thresh=2)

Method 2: Only Keep Rows with Minimum % of non-NaN Values

#only keep rows with at least 70% non-NaN values
df.dropna(thresh=0.7*len(df.columns))

Method 3: Only Keep Columns with Minimum Number of non-NaN Values

#only keep columns with at least 6 non-NaN values
df.dropna(thresh=6, axis=1)

Method 4: Only Keep Columns with Minimum % of non-NaN Values

#only keep columns with at least 70% non-NaN values
df.dropna(thresh=0.7*len(df), axis=1)

The following examples show how to use each method in practice with the following pandas DataFrame:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
                   'points': [18, np.nan, 19, 14, 14, 11, 20, np.nan],
                   'assists': [5, np.nan, np.nan, 9, np.nan, 9, 9, 4],
                   'rebounds': [11, np.nan, 10, 6, 6, 5, 9, np.nan]})

#view DataFrame
print(df)

  team  points  assists  rebounds
0    A    18.0      5.0      11.0
1    B     NaN      NaN       NaN
2    C    19.0      NaN      10.0
3    D    14.0      9.0       6.0
4    E    14.0      NaN       6.0
5    F    11.0      9.0       5.0
6    G    20.0      9.0       9.0
7    H     NaN      4.0       NaN

Example 1: Only Keep Rows with Minimum Number of non-NaN Values

We can use the following syntax to only keep the rows in the DataFrame that have at least 2 non-NaN values:

#only keep rows with at least 2 non-NaN values
df.dropna(thresh=2)

	team	points	assists	rebounds
0	A	18.0	5.0	11.0
2	C	19.0	NaN	10.0
3	D	14.0	9.0	6.0
4	E	14.0	NaN	6.0
5	F	11.0	9.0	5.0
6	G	20.0	9.0	9.0
7	H	NaN	4.0	NaN

Notice that the row in index position 1 has been dropped since it only had 1 non-NaN value in the entire row.

Example 2: Only Keep Rows with Minimum % of non-NaN Values

We can use the following syntax to only keep the rows in the DataFrame that have at least 70% non-NaN values:

#only keep rows with at least 70% non-NaN values
df.dropna(thresh=0.7*len(df.columns))

        team	points	assists	rebounds
0	A	18.0	5.0	11.0
2	C	19.0	NaN	10.0
3	D	14.0	9.0	6.0
4	E	14.0	NaN	6.0
5	F	11.0	9.0	5.0
6	G	20.0	9.0	9.0

Notice that the rows in index positions 1 and 7 have been dropped since those rows did not have at least 70% of the values as non-NaN values.

Example 3: Only Keep Columns with Minimum Number of non-NaN Values

We can use the following syntax to only keep the columns in the DataFrame that have at least 6 non-NaN values:

#only keep columns with at least 6 non-NaN values
df.dropna(thresh=6, axis=1)

        team	points	rebounds
0	A	18.0	11.0
1	B	NaN	NaN
2	C	19.0	10.0
3	D	14.0	6.0
4	E	14.0	6.0
5	F	11.0	5.0
6	G	20.0	9.0
7	H	NaN	NaN

Notice that the ‘assists’ column has been dropped because that column did not have at least 6 non-NaN values in the column.

Example 4: Only Keep Columns with Minimum % of non-NaN Values

We can use the following syntax to only keep the columns in the DataFrame that have at least 70% non-NaN values:

#only keep columns with at least 70% non-NaN values
df.dropna(thresh=0.7*len(df), axis=1)

        team	points	rebounds
0	A	18.0	11.0
1	B	NaN	NaN
2	C	19.0	10.0
3	D	14.0	6.0
4	E	14.0	6.0
5	F	11.0	5.0
6	G	20.0	9.0
7	H	NaN	NaN

Notice that the ‘assists’ column has been dropped because that column did not have at least 70% non-NaN values in the column.

Note: You can find the complete documentation for the pandas dropna() function here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Reset Index After Using dropna()
Pandas: How to Use dropna() with Specific Columns
Pandas: How to Drop Rows Based on Multiple Conditions

Leave a Reply

Your email address will not be published. Required fields are marked *