How to Use the pull() Function in dplyr


Often you may want to extract a single column from a data frame in R and then perform various operations on the column.

One of the best ways to do this is by using the pull() function from the dplyr package in R, which is designed to perform this exact task.

The pull() function uses the following basic syntax:

pull(.data, var = -1, name = NULL, …)

where:

  • .data: The name of the data frame
  • var: A variable in the data frame specified as the literal variable name, a positive integer specifying the count from the left, or a negative integer specifying the count from the right
  • name: Specifies the column to be used as names for a named vector (optional)

By default, the pull() function extracts the last column from the data frame under the assumption that this is the most recently created column. Feel free to specify a different value for the var argument to extract a different column.

The following example shows how to use the pull() function from the dplyr package in practice.

Example: How to Use pull() in dplyr

Suppose we create the following data frame that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(99, 68, 86, 88, 95, 74, 78, 93),
                 assists=c(22, 28, 45, 35, 34, 45, 28, 31),
                 rebounds=c(30, 28, 24, 24, 30, 36, 30, 29))

#view data frame
df

  team points assists rebounds
1    A     99      22       30
2    A     68      28       28
3    A     86      45       24
4    A     88      35       24
5    B     95      34       30
6    B     74      45       36
7    B     78      28       30
8    B     93      31       29

Suppose that we would like to extract only the points column from the data frame.

We can use the following syntax with the pull() function to do so:

library(dplyr)

#extract points column from data frame
df %>% pull(points)

[1] 99 68 86 88 95 74 78 93

Notice that in this example we specified the points column by name and we’re able to successfully extract only the values from the points column of the data frame.

Another way to extract the points column is by referencing its index location from the left:

library(dplyr)

#extract points column from data frame
df %>% pull(2)

[1] 99 68 86 88 95 74 78 93

The points column is in index position 2 from the left, so using pull(2) allows us to also extract the points column only.

Another way to extract the points column is by referencing its index location from the right:

library(dplyr)

#extract points column from data frame
df %>% pull(-3)

[1] 99 68 86 88 95 74 78 93

The points column is in index position -3 from the right, so using pull(-3) allows us to also extract the points column only.

Note that you can also pipe the pull() function into other dplyr functions if you’d like.

For example, you can use the pull() function to extract the points column and then pipe it into the max() function to find the max value in the column:

library(dplyr)

#use pull() and max() together
df %>%
  pull(points) %>%
  max()

[1] 99

This returns 99, which represents the max value in the points column of the data frame.

Feel free to pipe the pull() function into any dplyr function that you would like.

Note: You can find the complete documentation for the pull() function from the dplyr package here.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Insert Row into Data Frame in R
How to Append Values to List in R
How to Convert Data Frame Column to List in R
How to Count Number of Elements in List in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *