How to Use the combine_first() Function in Pandas


Often you may want to replace null values in one pandas DataFrame with the corresponding elements from another DataFrame.

The most efficient way to do so is by using the combine_first() function, which is designed to perform this exact task.

The combine_first() function uses the following syntax:

DataFrame.combine_first(other)

where:

  • other: The name of the pandas DataFrame to use to fill the null values in another DataFrame.

It’s important to note that both DataFrames should have the same dimensions in order for the null values in one DataFrame to be filled in correctly.

The following example shows how to use the combine_first() function in practice with a pandas DataFrame.

Example: How to Use the combine_first() Function in Pandas

Suppose we create the following pandas DataFrame that contains information about the total sales made by various employees at some company:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'employee': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
                   'sales': [120, np.nan, 80, 75, 75, np.nan, 150]})

#view DataFrame
print(df)

  employee  sales
0        A  120.0
1        B    NaN
2        C   80.0
3        D   75.0
4        E   75.0
5        F    NaN
6        G  150.0

Notice that several values in the sales column are NaN, which are considered null values.

Suppose that we would like to fill in these NaN values by using the corresponding values from another DataFrame with the same dimensions.

Suppose we have another DataFrame named df that also contains the same data with the same column names:

import pandas as pd

#create DataFrame
df2 = pd.DataFrame({'employee': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
                   'sales': [120, 200, 80, 75, 75, 300, 150]})

#view DataFrame
print(df2)

  employee  sales
0        A  120.0
1        B    NaN
2        C   80.0
3        D   75.0
4        E   75.0
5        F    NaN
6        G  150.0

Notice that this DataFrame does not have any NaN values in the sales column.

Suppose that we would like to use df2 to fill in the missing values in the df DataFrame.

We can use the combine_first() function to accomplish this task.

We can use the following syntax to do so:

#replace missing values in df with corresponding elements from df2
df.combine_first(df2)

	employee  sales
0	A	  120.0
1	B	  200.0
2	C	   80.0
3	D	   75.0
4	E	   75.0
5	F	  300.0
6	G	  150.0

Notice that each of the NaN values from the DataFrame named df has been replaced with the corresponding elements from the DataFrame named df2.

In particular, we can see:

  • The NaN value in row index 1 has been replaced with a value of 200.
  • The NaN value in row index 5 has been replaced with a value of 300.

It’s worth noting that we used the combine_first() function in this example to replace NaN values in one column of a DataFrame but you can use the same function to replace NaN values in multiple columns of a given DataFrame.

Feel free to use this function to replace NaN values in as many columns as you’d like in your own DataFrame.

Note: You can find the complete documentation for the combine_first() function in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Use the Rolling.apply() Function in Pandas
How to Use the nunique() Function in Pandas
How to Use the get_loc() Function in Pandas
How to Use idxmin() Function in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *