dplyr: How to Slice First and Last Row in Each Group


Often you may want to select the first and last row in each group of a data frame in R.

Fortunately this is easy to do by using the following basic syntax with the dplyr package in R:

library(dplyr)

df %>%
  group_by(team) %>%
  filter(row_number() %in% c(1, n())) 

This particular example groups the rows of the data frame by the values in the team column and then returns the first and last row from each group.

This syntax works by using the filter() function to filter for rows where the row_number is equal to either 1 or n() where n() represents the last row for each group.

The following example shows how to use this syntax in practice.

Note: You may need to first use the following syntax to install the dplyr package:

install.package('dplyr')

Once the dplyr package is installed, you can then use the various functions from it to return the first and last row from each group.

Example: How to Slice First and Last Row in Each Group Using dplyr

Suppose we create the following data frame that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(99, 68, 86, 88, 95, 74, 78, 93),
                 assists=c(22, 28, 45, 35, 34, 45, 28, 31),
                 rebounds=c(30, 28, 24, 24, 30, 36, 30, 29))

#view data frame
df

  team points assists rebounds
1    A     99      22       30
2    A     68      28       28
3    A     86      45       24
4    A     88      35       24
5    B     95      34       30
6    B     74      45       36
7    B     78      28       30
8    B     93      31       29

Notice that there are two unique teams in this data frame: A and B.

Suppose that we would like to select the first and last row from the data frame for each of these teams.

We can use the following syntax to do so:

library(dplyr)

#select first and last row for each team
df %>%
  group_by(team) %>%
  filter(row_number() %in% c(1, n()))

# A tibble: 4 x 4
# Groups:   team [2]
  team  points assists rebounds
           
1 A         99      22       30
2 A         88      35       24
3 B         95      34       30
4 B         93      31       29

This returns the first and last row for each team in the data frame.

Note that you could also choose to only return the first row from each group by using the following syntax:

library(dplyr)

#select first row for each team
df %>%
  group_by(team) %>%
  filter(row_number() %in% c(1))

# A tibble: 2 x 4
# Groups:   team [2]
  team  points assists rebounds
           
1 A         99      22       30
2 B         95      34       30

Notice that this returns only the first row for each team in the data frame.

Or we could also choose to only return the last row from each group by using the following syntax:

library(dplyr)

#select last row for each team
df %>%
  group_by(team) %>%
  filter(row_number() %in% c(n()))

# A tibble: 2 x 4
# Groups:   team [2]
  team  points assists rebounds
           
1 A         88      35       24
2 B         93      31       29

Notice that this returns only the last row for each team in the data frame.

Feel free to specify whichever row numbers you would like after the %in% function in the filter() function to return whichever row numbers you would like for each group.

Also note that you can specify multiple variables to group by within the group_by() function if you would like.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Use slice_max() in dplyr
How to Rename Columns Using dplyr
How to Add Row to Data Frame Using dplyr
How to Use the pull() Function in dplyr

Leave a Reply

Your email address will not be published. Required fields are marked *