The cut() function in R can be used to cut a range of values into bins and specify labels for each bin.
This function uses the following syntax:
cut(x, breaks, labels = NULL, …)
where:
- x: Name of vector
- breaks: Number of breaks to make or vector of break points
- labels: Labels for the resulting bins
The following examples show how to use this function in different scenarios with the following data frame in R:
#create data frame
df <- data.frame(player=c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'),
points=c(4, 7, 8, 12, 14, 16, 20, 26, 36))
#view data frame
df
player points
1 A 4
2 B 7
3 C 8
4 D 12
5 E 14
6 F 16
7 G 20
8 H 26
9 I 36
Example 1: Cut Vector Based on Number of Breaks
The following code shows how to use the cut() function to create a new column called category that cuts the points column into bins of four equal sizes:
#create new column that places each player into four categories based on points
df$category <- cut(df$points, breaks=4)
#view updated data frame
df
player points category
1 A 4 (3.97,12]
2 B 7 (3.97,12]
3 C 8 (3.97,12]
4 D 12 (3.97,12]
5 E 14 (12,20]
6 F 16 (12,20]
7 G 20 (12,20]
8 H 26 (20,28]
9 I 36 (28,36]
Since we specified breaks=4, the cut() function split the values in the points column into bins of four equal sizes.
Here is how the cut() function did this:
- First, it found the difference between the largest and smallest values in the points column (36 – 4 = 32)
- Then, it divided this difference by 4 (32 / 4 = 8)
- The result is four bins each with a width of 8
Note: The lowest interval is equal to 3.97 instead of 4 because of the following functionality from the cut() documentation:
When breaks is specified as a single number, the range of the data is divided into breaks pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals.
Example 2: Cut Vector Based on Specific Break Points
The following code shows how to use the cut() function to create a new column called category that cuts the points column based on a vector of specific break points:
#create new column based on specific break points
df$category <- cut(df$points, breaks=c(0, 10, 15, 20, 40))
#view updated data frame
df
player points category
1 A 4 (0,10]
2 B 7 (0,10]
3 C 8 (0,10]
4 D 12 (10,15]
5 E 14 (10,15]
6 F 16 (15,20]
7 G 20 (15,20]
8 H 26 (20,40]
9 I 36 (20,40]
The cut() function categorized each player into bins based on the specific vector of break points we provided.
Example 3: Cut Vector Using Specific Break Points and Labels
The following code shows how to use the cut() function to create a new column called category that cuts the points column based on a vector of specific break points with custom labels:
#create new column based on values in points column
df$category <- cut(df$points,
breaks=c(0, 10, 15, 20, 40),
labels=c('Bad', 'OK', 'Good', 'Great'))
#view updated data frame
df
player points category
1 A 4 Bad
2 B 7 Bad
3 C 8 Bad
4 D 12 OK
5 E 14 OK
6 F 16 Good
7 G 20 Good
8 H 26 Great
9 I 36 Great
The new category column classifies each player as Bad, OK, Good, or Great depending on their corresponding value in the points column.
Note: The number of labels should always be one less than the number of break points to avoid the following error:
Error in cut.default(df$points, breaks = c(0, 10, 15, 20, 40), labels = c("Bad", :
lengths of 'breaks' and 'labels' differ
Additional Resources
The following tutorials explain how to use other common functions in R:
How to Use tabulate() Function in R
How to Use split() Function in R
How to Use match() Function in R
How to Use replicate() Function in R