How to Use unstack() in Pandas


Often you may want to unstack a pandas Series or DataFrame so that one or more levels of a MultiIndex can be pivoted out to columns.

The easiest way to do so is by using the unstack() function, which uses the following basic syntax:

df.unstack(level=-1, fill_value=None, sort=True)

where:

  • level: Levels of the index to unstack (default of -1 is last level)
  • fill_value: Replace NaN with this value if missing values are produced
  • sort: Whether or not to sort the levels in the resulting Multiindex columns

The following example shows how to use the unstack() function in practice in various scenarios.

Example: How to Use unstack() in Pandas

Suppose we create the following pandas Series with a MultiIndex:

import pandas as pd

#define MultiIndex
index = pd.MultiIndex.from_tuples([('Mavs', 'Guard'), ('Mavs', 'Forward'),
                                   ('Heat', 'Guard'), ('Heat', 'Forward')])

#create series with MultiIndex
my_data = pd.Series([14, 39, 25, 20], index=index)

#view series
my_data

Mavs  Guard      14
      Forward    39
Heat  Guard      25
      Forward    20
dtype: int64

Notice that the pandas Series contains two levels in the index:

  • The first level contains the team name of the player
  • The second level contains the position of the player

The final column then contains the values of the points scored by each player.

Suppose that we would like to unstack the MultiIndex so that the last level (the “position” level) is pivoted out into columns.

We can use the following syntax to do so:

#unstack last level of MultiIndex
my_data.unstack()

	Forward	Guard
Heat	20	25
Mavs	39	14

Notice that the last level of the MultiIndex has been pivoted out into columns.

The Series is now shaped in such a way that the team names are shown as row names and the position names are shown as column names.

Note that the unstack() function pivots out the last level of the MultiIndex by default. This means that specifying level=1 would produce the same result:

#unstack last level of MultiIndex
my_data.unstack(level=1)

        Forward	Guard
Heat	20	25
Mavs	39	14

If we would instead like to pivot out the first level in the MultiIndex (the “team” column) then we could specify level=0 instead:

#unstack first level of MultiIndex
my_data.unstack(level=0)

	Heat	Mavs
Forward	20	39
Guard	25	14

Notice that the position names are now shown as the index values and the team names have been pivoted out to be the column names.

Note that in this particular example, there are exactly two positions assigned for each team.

However, if there is an occurrence of one position name that does not occur for each team then it’s possible that NaN values will be produced when we use the unstack() function.

In this scenario, you could use the fill_value argument to specify the value that should be used instead of displaying NaN as the result.

For example, you could use the following syntax to display a value of 0 instead of NaN:

#unstack last level of MultiIndex and display 0 if NaN's are produced
my_data.unstack(fill_value=0)

Note: You can find the complete documentation for the pandas unstack() function here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Sort Rows by Date
Pandas: How to Sort Columns by Name
Pandas: How to Sort by Both Index and Column

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *