How to Use the mask() Function in Pandas


Often you may want to replace all values in a pandas DataFrame where some condition is true.

The most efficient way to do so is by using the mask() function, which uses the following syntax:

DataFrame.mask(cond, inplace=False, axis=None, level=None)

where:

  • cond: A pandas Series or pandas DataFrame
  • inplace: Whether to perform the operation in place or not
  • axis: Alignment axis if needed
  • level: Alignment level if needed

The following example shows how to use the mask() function in practice with a pandas DataFrame.

Example: How to Use the mask() Function in Pandas

Suppose we create the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'C', 'C', 'C'],
                   'points': [12, 18, 18, 22, 30, 41, 12],
                   'assists': [8, 10, 11, 11, 7, 12, 8]})

#view DataFrame
print(df)

  team  points  assists
0    A      12        8
1    A      18       10
2    B      18       11
3    B      22       11
4    C      30        7
5    C      41       12
6    C      12        8

Suppose that we would like to use the mask() function to convert each value in a column to NaN based on whether or not some condition is true.

For example, suppose we would like to convert any value in the points column that is greater than 20 to NaN.

We can use the following syntax to do so:

#convert any value in points column to NaN if greater than 20
df['points'].mask(df['points'] > 20)

0    12.0
1    18.0
2    18.0
3     NaN
4     NaN
5     NaN
6    12.0
Name: points, dtype: float64

From the output we can see that any value in the points column that is greater than 20 is now shown as NaN.

For example:

  • The first value (12) is not greater than 20, so it is left alone.
  • The second value (18) is not greater than 20, so it is left alone.
  • The third value (18) is not greater than 20, so it is left alone.
  • The fourth value (22) is greater than 20, so it is converted to NaN.

And so on.

It’s worth noting that we did not use the inplace argument, so the actual values in the original DataFrame have not been modified.

For example, if we print the original DataFrame again we will see that all values in the points column have remained unchanged:

#view DataFrame
print(df)

  team  points  assists
0    A      12        8
1    A      18       10
2    B      18       11
3    B      22       11
4    C      30        7
5    C      41       12
6    C      12        8

If we’d like, we can also specify the value to be used as a replacement instead of using NaN as a replacement.

For example, we can use the following syntax to replace each value in the points column that is greater than 20 with a new value of 100:

#convert any value in points column to 100 if greater than 20
df['points'].mask(df['points'] > 20, 100)

0     12
1     18
2     18
3    100
4    100
5    100
6     12
Name: points, dtype: int64

We can see that the three values that were converted to NaN in the previous example have now been converted to 100, since this is the replacement value that we specified.

Feel free to use any replacement value that you would like in your own DataFrame.

Note: You can find the complete documentation for the mask() function in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Use the nunique() Function in Pandas
How to Use the get_loc() Function in Pandas
How to Create a Tuple from Two Columns in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *