How to Use bfill() Function in Pandas


Often you may want to fill NaN values in a pandas DataFrame by using the next valid observation to fill the NaN value.

The most efficient way to do so is by using the bfill() function, which uses the following syntax:

DataFrame.bfill(axis=None, inplace=False, limit=None, limit_area=None, …)

where:

  • axis: The axis to use (0=Series, 1=DataFrame)
  • inplace: Whether to fill in-place or not
  • limit: Max number of NaN values to fill
  • limit_area: Restriction to use if limit is specified to be True

The following example shows how to use the bfill() function in practice with a pandas DataFrame.

Example: How to Use the bfill() Function in Pandas

Suppose we create the following pandas DataFrame that contains information about various basketball players:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'C', 'C', 'C'],
                   'points': [12, np.nan, np.nan, 22, 30, 41, 12],
                   'assists': [8, 10, 11, 11, 7, np.nan, 8]})

#view DataFrame
print(df)

  team  points  assists
0    A    12.0      8.0
1    A     NaN     10.0
2    B     NaN     11.0
3    B    22.0     11.0
4    C    30.0      7.0
5    C    41.0      NaN
6    C    12.0      8.0

Notice that there are several NaN values in the DataFrame.

Suppose that we would like to use the bfill() function to fill in the missing values in the DataFrame.

We can use the following syntax to do so:

#fill in NaN values in each column of DataFrame
df.bfill()

	team	points	assists
0	A	12.0	8.0
1	A	22.0	10.0
2	B	22.0	11.0
3	B	22.0	11.0
4	C	30.0	7.0
5	C	41.0	8.0
6	C	12.0	8.0

Notice that each of the NaN values have been filled in with the next available values in each column.

Note that we can also fill in NaN values for one specific column if we’d like.

For example, we can use the following syntax to fill in the NaN values in the points column only:

#fill in NaN values in points column only
df['points'] = df['points'].bfill()

#view updated DataFrame
print(df)

  team  points  assists
0    A    12.0      8.0
1    A    22.0     10.0
2    B    22.0     11.0
3    B    22.0     11.0
4    C    30.0      7.0
5    C    41.0      NaN
6    C    12.0      8.0

Notice that only the NaN values in the points column have been filled in while the NaN value in the assists column has remain untouched.

Also note that we can use the limit argument to limit the number of consecutive NaN values that should be filled in.

For example, we can specify limit=1 to only fill in the first NaN value in the points column:

#fill in NaN values in points column only (limit of 1)
df['points'] = df['points'].bfill(limit=1)

#view updated DataFrame
print(df)

  team  points  assists
0    A    12.0      8.0
1    A     NaN     10.0
2    B    22.0     11.0
3    B    22.0     11.0
4    C    30.0      7.0
5    C    41.0      NaN
6    C    12.0      8.0

Notice that only the first NaN value in the points column has been replaced while the next NaN value has simply been left untouched.

In practice, you may choose to use the limit argument when it only makes sense to fill in a NaN value with the next available value instead of filling in consecutive values.

Note: You can find the complete documentation for the bfill() function in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Use the Rolling.apply() Function in Pandas
How to Use the nunique() Function in Pandas
How to Use the get_loc() Function in Pandas
How to Use idxmin() Function in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *