How to Use the cut() Function in R


The cut() function in R can be used to cut a range of values into bins and specify labels for each bin.

This function uses the following syntax:

cut(x, breaks, labels = NULL, …)

where:

  • x: Name of vector
  • breaks: Number of breaks to make or vector of break points
  • labels: Labels for the resulting bins

The following examples show how to use this function in different scenarios with the following data frame in R:

#create data frame
df <- data.frame(player=c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'),
                 points=c(4, 7, 8, 12, 14, 16, 20, 26, 36))

#view data frame
df

  player points
1      A      4
2      B      7
3      C      8
4      D     12
5      E     14
6      F     16
7      G     20
8      H     26
9      I     36

Example 1: Cut Vector Based on Number of Breaks

The following code shows how to use the cut() function to create a new column called category that cuts the points column into bins of four equal sizes:

#create new column that places each player into four categories based on points
df$category <- cut(df$points, breaks=4)

#view updated data frame
df

  player points  category
1      A      4 (3.97,12]
2      B      7 (3.97,12]
3      C      8 (3.97,12]
4      D     12 (3.97,12]
5      E     14   (12,20]
6      F     16   (12,20]
7      G     20   (12,20]
8      H     26   (20,28]
9      I     36   (28,36]

Since we specified breaks=4, the cut() function split the values in the points column into bins of four equal sizes.

Here is how the cut() function did this:

  • First, it found the difference between the largest and smallest values in the points column (36 – 4 = 32)
  • Then, it divided this difference by 4 (32 / 4 = 8)
  • The result is four bins each with a width of 8

Note: The lowest interval is equal to 3.97 instead of 4 because of the following functionality from the cut() documentation:

When breaks is specified as a single number, the range of the data is divided into breaks pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals.

Example 2: Cut Vector Based on Specific Break Points

The following code shows how to use the cut() function to create a new column called category that cuts the points column based on a vector of specific break points:

#create new column based on specific break points
df$category <- cut(df$points, breaks=c(0, 10, 15, 20, 40))

#view updated data frame
df

  player points category
1      A      4   (0,10]
2      B      7   (0,10]
3      C      8   (0,10]
4      D     12  (10,15]
5      E     14  (10,15]
6      F     16  (15,20]
7      G     20  (15,20]
8      H     26  (20,40]
9      I     36  (20,40]

The cut() function categorized each player into bins based on the specific vector of break points we provided.

Example 3: Cut Vector Using Specific Break Points and Labels

The following code shows how to use the cut() function to create a new column called category that cuts the points column based on a vector of specific break points with custom labels:

#create new column based on values in points column
df$category <- cut(df$points,
                   breaks=c(0, 10, 15, 20, 40),
                   labels=c('Bad', 'OK', 'Good', 'Great'))

#view updated data frame
df

  player points category
1      A      4      Bad
2      B      7      Bad
3      C      8      Bad
4      D     12       OK
5      E     14       OK
6      F     16     Good
7      G     20     Good
8      H     26    Great
9      I     36    Great

The new category column classifies each player as Bad, OK, Good, or Great depending on their corresponding value in the points column.

Note: The number of labels should always be one less than the number of break points to avoid the following error:

Error in cut.default(df$points, breaks = c(0, 10, 15, 20, 40), labels = c("Bad",  : 
  lengths of 'breaks' and 'labels' differ

Additional Resources

The following tutorials explain how to use other common functions in R:

How to Use tabulate() Function in R
How to Use split() Function in R
How to Use match() Function in R
How to Use replicate() Function in R

Leave a Reply

Your email address will not be published. Required fields are marked *