Pandas: How to Merge Columns Sharing Same Name


You can use the following basic syntax to merge together columns in a pandas DataFrame that share the same column name:

#define function to merge columns with same names together
def same_merge(x): return ','.join(x[x.notnull()].astype(str))

#define new DataFrame that merges columns with same names together
df_new = df.groupby(level=0, axis=1).apply(lambda x: x.apply(same_merge, axis=1))

The following example shows how to use this syntax in practice.

Example: Merge Together Columns Sharing Same Name in Pandas

Suppose we have the following pandas DataFrame:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'A': [5, 6, 8, np.nan, 4, np.nan, np.nan],
                   'A1': [np.nan, 12, np.nan, 10, np.nan, 6, 4],
                   'B': [2, 7, np.nan, np.nan, 2, 4, np.nan],
                   'B1': [5, np.nan, 6, 15, 1, np.nan, 4]})

#rename columns so there are duplicate column names
df.columns = ['A', 'A', 'B', 'B']

#view DataFrame
print(df)

     A     A    B     B
0  5.0   NaN  2.0   5.0
1  6.0  12.0  7.0   NaN
2  8.0   NaN  NaN   6.0
3  NaN  10.0  NaN  15.0
4  4.0   NaN  2.0   1.0
5  NaN   6.0  4.0   NaN
6  NaN   4.0  NaN   4.0

Notice that two columns have a name of ‘A’ and two columns have a name of ‘B.’

We can use the following code to merge the columns that have the same column names and concatenate their values together with a comma:

#define function to merge columns with same names together
def same_merge(x): return ','.join(x[x.notnull()].astype(str))

#define new DataFrame that merges columns with same names together
df_new = df.groupby(level=0, axis=1).apply(lambda x: x.apply(same_merge, axis=1))

#view new DataFrame
print(df_new)

          A        B
0       5.0  2.0,5.0
1  6.0,12.0      7.0
2       8.0      6.0
3      10.0     15.0
4       4.0  2.0,1.0
5       6.0      4.0
6       4.0      4.0

The new DataFrame has merged together the columns with the same names and concatenated their values together with a comma.

If you would like to use a different separator, simply change the comma separator to something else in the same_merge() function.

For example, the following code shows how to use a semi-colon separator instead:

#define function to merge columns with same names together
def same_merge(x): return ';'.join(x[x.notnull()].astype(str))

#define new DataFrame that merges columns with same names together
df_new = df.groupby(level=0, axis=1).apply(lambda x: x.apply(same_merge, axis=1))

#view new DataFrame
print(df_new)

          A        B
0       5.0  2.0;5.0
1  6.0;12.0      7.0
2       8.0      6.0
3      10.0     15.0
4       4.0  2.0;1.0
5       6.0      4.0
6       4.0      4.0

The new DataFrame has merged together the columns with the same names and concatenated their values together with a semi-colon.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Drop Duplicate Columns in Pandas
How to List All Column Names in Pandas
How to Sort Columns by Name in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *