How to Bin Variables in Python Using numpy.digitize()


Often you may be interested in placing the values of a variable into “bins” in Python. Fortunately this is easy to do using the numpy.digitize() function, which uses the following syntax:

numpy.digitize(x, bins, right=False)

where:

  • x: Array to be binned.
  • bins: Array of bins.
  • right: Indicating whether the intervals include the right or the left bin edge. Default is that the interval does not include the right edge.

This tutorial shows several examples of how to use this function in practice.

Example 1: Place All Values into Two Bins

The following code shows how to place the values of an array into two bins:

  • 0 if x < 20
  • if x ≥ 20
import numpy as np

#create data
data = [2, 4, 4, 7, 12, 14, 19, 20, 24, 31, 34]

#place values into bins
np.digitize(data, bins=[20])

array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1])

Example 2: Place All Values into Three Bins

The following code shows how to place the values of an array into three bins:

  • 0 if x < 10
  • if 10 ≤ x < 20
  • if x ≥ 20
import numpy as np

#create data
data = [2, 4, 4, 7, 12, 14, 20, 22, 24, 31, 34]

#place values into bins
np.digitize(data, bins=[10, 20])

array([0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2])

Note that if we specify right=True then the values would be placed into the following bins:

  • 0 if x ≤ 10
  • if 10 < x ≤ 20
  • if x > 20

Each interval would include the right bin edge. Here’s what that looks like:

import numpy as np

#create data
data = [2, 4, 4, 7, 12, 14, 20, 22, 24, 31, 34]

#place values into bins
np.digitize(data, bins=[10, 20], right=True)

array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2])

Example 3: Place All Values into Four Bins

The following code shows how to place the values of an array into three bins:

  • 0 if x < 10
  • if 10 ≤ x < 20
  • if 20 ≤ x < 30
  • if x ≥ 30
import numpy as np

#create data
data = [2, 4, 4, 7, 12, 14, 20, 22, 24, 31, 34]

#place values into bins
np.digitize(data, bins=[10, 20, 30])

array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3])

Example 4: Count the Frequency of Each Bin

Another useful NumPy function that complements the numpy.digitize() function is the numpy.bincount() function, which counts the frequencies of each bin.

The following code shows how to place the values of an array into three bins and then count the frequency of each bin:

import numpy as np

#create data
data = [2, 4, 4, 7, 12, 14, 20, 22, 24, 31, 34]

#place values into bins
bin_data = np.digitize(data, bins=[10, 20])

#view binned data
bin_data

array([0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2])

#count frequency of each bin
np.bincount(bin_data)

array([4, 2, 5])

The output tells us that:

  • Bin “0” contains data values.
  • Bin “1” contains data values.
  • Bin “2” contains data values.

Find more Python tutorials here.

Leave a Reply

Your email address will not be published. Required fields are marked *