How to Create Pivot Tables in Python


pivot table is a type of table that summarizes a dataset using summary statistics.

We can create pivot tables in Python by using the pivot_table function from the pandas package, which uses the following syntax:

pivot_table(data, index=None, values=None, aggfunc=’mean’, fill_value=None, …)

where:

  • data: name of DataFrame
  • index: columns of DataFrame to group by
  • values: columns of DataFrame to aggregate
  • aggfunc: aggregate function to use. The default is mean.
  • fill_value: value to replace missing values with.

Example: Pivot Tables in Python

Suppose we have the following dataset that contains information about eight basketball players:

import pandas as pd

data = {'team': ['E', 'E', 'C', 'C', 'C', 'W', 'W', 'W'],
        'player': ['Andy', 'Ben', 'Chris', 'Dwight', 'Elias', 'Frank', 'Greg', 'Hank'],
        'rebounds': [12, 14, 13, 7, 8, 8, 9, 13],
        'points': [22, 24, 26, 26, 29, 32, 20, 14]
        }

df = pd.DataFrame(data,columns=['team', 'player', 'rebounds','points'])
df

   team  player  rebounds  points
0     E    Andy        12      22
1     E     Ben        14      24
2     C   Chris        13      26
3     C  Dwight         7      26
4     C   Elias         8      29
5     W   Frank         8      32
6     W    Greg         9      20
7     W    Hank        13      14

To find the mean number of points and rebounds scored by players on each team, we can use the following syntax:

pd.pivot_table(df,index=['team'])

     points   rebounds
team 
   C     27   9.333333
   E     23  13.000000
   W     22  10.000000

To only find the mean number of points scored by players on each team, we can use the following syntax:

pd.pivot_table(df,index=['team'], values=['points'])

     points
team 
   C     27
   E     23
   W     22

To find the total number of points scored by players on each team, we can use the following syntax:

pd.pivot_table(df,index=['team'], values=['points'], aggfunc='sum')

     points
team 
   C     81
   E     46
   W     66

To find the max number of points scored by players on each team, we can use the following syntax:

pd.pivot_table(df,index=['team'], values=['points'], aggfunc='max')

     points
team 
   C     29
   E     34
   W     32

To find the total number of individual players on each team, we can use the following syntax:

pd.pivot_table(df,index=['team'], values=['player'], aggfunc='count')

     player
team 
   C      3
   E      2
   W      3

Dealing with Missing Values

Suppose you have missing values in your dataset:

import pandas as pd 
import numpy as np
 
data = {'team': ['E', 'E', 'C', 'C', 'C', 'W', 'W', 'W'],
        'player': ['Andy', 'Ben', 'Chris', 'Dwight', 'Elias', 'Frank', 'Greg', 'Hank'],
        'rebounds': [12, 14, np.NaN, 7, 8, np.NaN, 9, 13],
        'points': [22, 24, 26, 26, 29, 32, np.NaN, 14]
        }

df = pd.DataFrame(data,columns=['team', 'player', 'rebounds','points'])
df

   team  player    rebounds   points
0     E    Andy        12.0     22.0
1     E     Ben        14.0     24.0
2     C   Chris         NaN     26.0
3     C  Dwight         7.0     26.0
4     C   Elias         8.0     29.0
5     W   Frank         NaN     32.0
6     W    Greg         9.0      NaN
7     W    Hank        13.0     14.0

To calculate summary statistics, you can use numpy functions which automatically disregard missing values. For example, here’s how to calculate the mean number of points scored by players on each team:

pd.pivot_table(df,index=['team'], values=['points'], aggfunc=[np.mean])

	mean
      points
team	
C	27.0
E	23.0
W	23.0

Filtering Pivot Tables

Once you’ve created a pivot table, you can filter it.

For example, suppose we only want to view the mean statistics for players who are on team W:

pivotTable = pd.pivot_table(df,index=['team'])

pivotTable.query('team == ["W"]')

     points  rebounds
team 
   W   23.0      11.0

Or suppose we only want to view the mean statistics for players on teams E or W:

pivotTable.query('team == ["E", "W"]')

     points  rebounds
team 
   E   23.0      13.0
   W   23.0      11.0

You can find the complete documentation for the pivot_table function here.

Leave a Reply

Your email address will not be published. Required fields are marked *