How to Create Frequency Tables in Python


frequency table is a table that displays the frequencies of different categories. This type of table is particularly useful for understanding the distribution of values in a dataset.

This tutorial explains how to create frequency tables in Python.

One-Way Frequency Table for a Series

To find the frequencies of individual values in a pandas Series, you can use the value_counts() function:

import pandas as pd

#define Series
data = pd.Series([1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 5])

#find frequencies of each value
data.value_counts()

3    4
1    3
4    2
5    1
2    1

You can add the argument sort=False if you don’t want the data values sorted by frequency:

data.value_counts(sort=False)

1    3
2    1
3    4
4    2
5    1

The way to interpret the output is as follows:

  • The value “1” occurs times in the Series.
  • The value “2” occurs time in the Series.
  • The value “3” occurs times in the Series.

And so on.

One-Way Frequency Table for a DataFrame

To find frequencies of a pandas DataFrame you can use the crosstab() function, which uses the following sytax:

crosstab(index, columns)

where:

  • index: name of column to group by
  • columns: name to give to frequency column

For example, suppose we have a DataFrame with information about the letter grade, age, and gender of 10 different students in a class. Here’s how to find the frequency for each letter grade:

#create data
df = pd.DataFrame({'Grade': ['A','A','A','B','B', 'B', 'B', 'C', 'D', 'D'],
                   'Age': [18, 18, 18, 19, 19, 20, 18, 18, 19, 19],
                   'Gender': ['M','M', 'F', 'F', 'F', 'M', 'M', 'F', 'M', 'F']})

#view data
df

	Grade	Age	Gender
0	    A	 18	     M
1	    A	 18	     M
2	    A	 18	     F
3	    B	 19	     F
4	    B	 19	     F
5	    B	 20	     M
6	    B	 18	     M
7	    C	 18	     F
8	    D	 19	     M
9	    D	 19	     F 	  

#find frequency of each letter grade
pd.crosstab(index=df['Grade'], columns='count')

col_0	count
Grade	
A	    3
B	    4
C	    1
D	    2

The way to interpret this is as follows:

  • students received an ‘A’ in the class.
  • students received a ‘B’ in the class.
  • student received a ‘C’ in the class.
  • students received a ‘D’ in the class.

We can use a similar syntax to find the frequency counts for other columns. For example, here’s how to find frequency by age:

pd.crosstab(index=df['Age'], columns='count') 

col_0	count
Age	
18   	    5
19	    4
20	    1

The way to interpret this is as follows:

  • students are 18 years old.
  • students are 19 years old.
  • student is 20 years old.

You can also easily display the frequencies as proportions of the entire dataset by dividing by the sum:

#define crosstab
tab = pd.crosstab(index=df['Age'], columns='count')

#find proportions 
tab/tab.sum()

col_0	count
Age	
18	  0.5
19	  0.4
20	  0.1

The way to interpret this is as follows:

  • 50% of students are 18 years old.
  • 40% of students are 19 years old.
  • 10% of students are 20 years old.

Two-Way Frequency Tables for a DataFrame

You can also create a two-way frequency table to display the frequencies for two different variables in the dataset. For example, here’s how to create a two-way frequency table for the variables Age and Grade:

pd.crosstab(index=df['Age'], columns=df['Grade'])


Grade	A	B	C	D
Age				
18	3	1	1	0
19	0	2	0	2
20	0	1	0	0

The way to interpret this is as follows:

  • There are students who are 18 years old and received an ‘A’ in the class.
  • There is student who is 18 years old and received a ‘B’ in the class.
  • There is student who is 18 years old and received a ‘C’ in the class.
  • There are students who are 18 years old and received a ‘D’ in the class.

And so on.

You can find the complete documentation for the crosstab() function here.

Leave a Reply

Your email address will not be published.