The **cut()** function in R can be used to cut a range of values into bins and specify labels for each bin.

This function uses the following syntax:

**cut(x, breaks, labels = NULL, …)**

where:

**x**: Name of vector**breaks**: Number of breaks to make or vector of break points**labels**: Labels for the resulting bins

The following examples show how to use this function in different scenarios with the following data frame in R:

#create data frame df <- data.frame(player=c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'), points=c(4, 7, 8, 12, 14, 16, 20, 26, 36)) #view data frame df player points 1 A 4 2 B 7 3 C 8 4 D 12 5 E 14 6 F 16 7 G 20 8 H 26 9 I 36

**Example 1: Cut Vector Based on Number of Breaks**

The following code shows how to use the **cut()** function to create a new column called **category** that cuts the **points** column into bins of four equal sizes:

#create new column that places each player into four categories based on points df$category <- cut(df$points, breaks=4) #view updated data frame df player points category 1 A 4 (3.97,12] 2 B 7 (3.97,12] 3 C 8 (3.97,12] 4 D 12 (3.97,12] 5 E 14 (12,20] 6 F 16 (12,20] 7 G 20 (12,20] 8 H 26 (20,28] 9 I 36 (28,36]

Since we specified **breaks=4**, the **cut()** function split the values in the points column into bins of four equal sizes.

Here is how the **cut()** function did this:

- First, it found the difference between the largest and smallest values in the points column (36 – 4 = 32)
- Then, it divided this difference by 4 (32 / 4 = 8)
- The result is four bins each with a width of 8

**Note**: The lowest interval is equal to 3.97 instead of 4 because of the following functionality from the **cut()** documentation:

When breaks is specified as a single number, the range of the data is divided into breaks pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals.

**Example 2: Cut Vector Based on Specific Break Points**

The following code shows how to use the **cut()** function to create a new column called **category** that cuts the **points** column based on a vector of specific break points:

#create new column based on specific break points df$category <- cut(df$points, breaks=c(0, 10, 15, 20, 40)) #view updated data frame df player points category 1 A 4 (0,10] 2 B 7 (0,10] 3 C 8 (0,10] 4 D 12 (10,15] 5 E 14 (10,15] 6 F 16 (15,20] 7 G 20 (15,20] 8 H 26 (20,40] 9 I 36 (20,40]

The **cut()** function categorized each player into bins based on the specific vector of break points we provided.

**Example 3: Cut Vector Using Specific Break Points and Labels**

The following code shows how to use the **cut()** function to create a new column called **category** that cuts the **points** column based on a vector of specific break points with custom labels:

#create new column based on values in points column df$category <- cut(df$points, breaks=c(0, 10, 15, 20, 40), labels=c('Bad', 'OK', 'Good', 'Great')) #view updated data frame df player points category 1 A 4 Bad 2 B 7 Bad 3 C 8 Bad 4 D 12 OK 5 E 14 OK 6 F 16 Good 7 G 20 Good 8 H 26 Great 9 I 36 Great

The new **category** column classifies each player as Bad, OK, Good, or Great depending on their corresponding value in the **points** column.

**Note**: The number of labels should always be one less than the number of break points to avoid the following error:

Error in cut.default(df$points, breaks = c(0, 10, 15, 20, 40), labels = c("Bad", : lengths of 'breaks' and 'labels' differ

