Pandas: How to Use ffill Based on Condition


You can use the following basic syntax to use the ffill() function in pandas to forward fill values based on a condition in another column:

df['sales'] = df.groupby('store')['sales'].ffill()

This particular example will forward fill values in the sales column only if the previous value in the store column is equal to the current value in the store column.

The following example shows how to use this syntax in practice.

Example: Use ffill Based on Condition in Pandas

Suppose we have the following pandas DataFrame that contains information about the total sales made by two different retail stores during four business quarters:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'store': ['A', 'A', 'B', 'A', 'B', 'A', 'B', 'B'],
                   'quarter': [1, 2, 1, 3, 2, 4, 3, 4],
                   'sales': [12, 22, 30, np.nan, 24, np.nan, np.nan, np.nan]})

#view DataFrame
print(df)

  store  quarter  sales
0     A        1   12.0
1     A        2   22.0
2     B        1   30.0
3     A        3    NaN
4     B        2   24.0
5     A        4    NaN
6     B        3    NaN
7     B        4    NaN

Notice that there are multiple NaN values in the sales column.

Suppose we would like to fill in these NaN values using the previous value in the sales column but we want to make sure that values correspond to the correct store.

We can use the following syntax to do so:

#group by store and forward fill values in sales column
df['sales'] = df.groupby('store')['sales'].ffill()

#view updated DataFrame
print(df)

  store  quarter  sales
0     A        1   12.0
1     A        2   22.0
2     B        1   30.0
3     A        3   22.0
4     B        2   24.0
5     A        4   22.0
6     B        3   24.0
7     B        4   24.0

Notice that the NaN values in the sales column have been replaced by the previous sales value and that the values correspond to the correct store.

For example:

  • The NaN value in row index position 3 has been replaced by the value 22, which was the most recent value in the sales column that corresponded to store A.
  • The NaN value in row index position 6 has been replaced by the value 24, which was the most recent value in the sales column that corresponded to store B.

And so on.

Note: You can find the complete documentation for the pandas ffill() function here.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Count Missing Values in Pandas
How to Drop Rows with NaN Values in Pandas
How to Drop Rows that Contain a Specific Value in Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *