How to Use top_n() in dplyr


Often you may want to select the top n rows from a data frame in R.

One of the most efficient ways to do so is by using the top_n() function from the dplyr package in R, which can be used to perform this exact task.

The top_n() function uses the following syntax:

top_n(x, n, wt)

where:

  • x: Name of the data frame
  • n: Number of rows to return from the top of the data frame
  • wt: The variable to use for ordering – if not specified, defaults to the last variable in the data frame.

The following example shows how to use the top_n() function in practice.

Note: Before using the top_n() function, you may need to first install the dplyr package. You can use the following syntax to do so:

install.packages('dplyr')

Once the dplyr package has been installed, you can proceed to use the top_n() function.

Example: How to Use the top_n() Function in R

Suppose that we create the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(99, 68, 86, 88, 95, 74, 78, 93),
                 assists=c(22, 28, 31, 35, 34, 45, 28, 31),
                 rebounds=c(30, 28, 24, 24, 30, 36, 30, 29))

#view data frame
df

  team points assists rebounds
1    A     99      22       30
2    A     68      28       28
3    A     86      31       24
4    A     88      35       24
5    B     95      34       30
6    B     74      45       36
7    B     78      28       30
8    B     93      31       29

Suppose that we would like to select the top six rows from the data frame.

We can use the following syntax with the top_n() function to do so:

library(dplyr)

#select top six rows from data frame
df %>% top_n(6)

  team points assists rebounds
1    A     99      22       30
2    A     68      28       28
3    B     95      34       30
4    B     74      45       36
5    B     78      28       30
6    B     93      31       29

The top_n() function returns the top six rows with the highest values in the last column of the table (the rebounds column) by default.

If you’d like, you can use the wt argument to specify a different column to use to determine which rows should be returned.

For example, you can use the following syntax to specify that you’d like to return the top six rows with the highest values in the assists column:

library(dplyr)

#select top six rows with highest values in assists column
df %>% top_n(6, wt=assists)

  team points assists rebounds
1    A     68      28       28
2    A     86      31       24
3    A     88      35       24
4    B     95      34       30
5    B     74      45       36
6    B     78      28       30
7    B     93      31       29

This returns the top six rows from the data frame with the highest values in the assists column.

If you’d like, you could also specify that you’d like to return the top n values grouped by a particular column.

For example, you can use the following syntax to return the rows with the highest values in the points column, grouped by the values in the team column:

library(dplyr)

#select row with highest value in points column, grouped by team
df %>% group_by(team) %>% top_n(1, wt=points)

# A tibble: 2 x 4
# Groups:   team [2]
  team  points assists rebounds
           
1 A         99      22       30
2 B         95      34       30

Note: You can find the complete documentation for the top_n function in dplyr here.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Create a Frequency Table by Group in R
How to Create a Frequency Polygon in R
How to Create Relative Frequency Tables in R
How to Calculate the Mode in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *