How to Use the ntile() Function in dplyr (With Examples)


You can use the ntile() function from the dplyr package in R to break up an input vector into n buckets.

This function uses the following basic syntax:

ntile(x, n)

where:

  • x: Input vector
  • n: Number of buckets

Note: The size of the buckets can differ by up to one.

The following examples show how to use this function in practice.

Example 1: Use ntile() with a Vector

The following code shows how to use the ntile() function to break up a vector with 11 elements into 5 different buckets:

library(dplyr)

#create vector
x <- c(1, 3, 4, 6, 7, 8, 10, 13, 19, 22, 23)

#break up vector into 5 buckets
ntile(x, 5)

 [1] 1 1 1 2 2 3 3 4 4 5 5

From the output we can see that each element from the original vector has been placed into one of five buckets.

The smallest values are assigned to bucket 1 while the largest values are assigned to bucket 5.

For example:

  • The smallest values of 1, 3, and 4 are assigned to bucket 1.
  • The largest values of 22 and 23 are assigned to bucket 5.

Example 2: Use ntile() with a Data Frame

Suppose we have the following data frame in R that shows the points scored by various basketball players:

#create data frame
df <- data.frame(player=LETTERS[1:9],
                 points=c(12, 19, 7, 22, 24, 28, 30, 19, 15))

#view data frame
df

  player points
1      A     12
2      B     19
3      C      7
4      D     22
5      E     24
6      F     28
7      G     30
8      H     19
9      I     15

The following code shows how to use the ntile() function to create a new column in the data frame that assigns each player into one of three buckets, depending on their points scored:

library(dplyr)

#create new column that assigns players into buckets based on points
df$bucket <- ntile(df$points, 3)

#view updated data frame
df

  player points bucket
1      A     12      1
2      B     19      2
3      C      7      1
4      D     22      2
5      E     24      3
6      F     28      3
7      G     30      3
8      H     19      2
9      I     15      1

The new bucket column assigns a value between 1 and 3 to each player.

The players with the lowest points receive a value of 1 and the players with the highest points receive a value of 3.

Additional Resources

The following tutorials explain how to use other common functions in R:

How to Use the across() Function in dplyr
How to Use the relocate() Function in dplyr
How to Use the slice() Function in dplyr

Leave a Reply

Your email address will not be published.