R: How to Select Specific Columns of data.table


Often you may want to select specific columns of a data.table in R.

You can use the following methods to do so:

Method 1: Select Columns of data.table By Name

dt[, c('team', 'assists')]

This particular example will select the team and assists columns from the data.table named dt.

Method 2: Select Columns of data.table Based on Vector

my_columns <- c('team', 'assists', 'points')

dt[, ..my_columns]

This particular example will select each of the column names specified in the my_columns vector from the data.table named dt.

Note that we must use the .. notation when subsetting a data.table and referencing the values from a vector.

The following example shows how to use each of these methods in practice with the following data.table in R:

library(data.table)

#create data table
dt <- data.table(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 position=c('G', 'G', 'F', 'F', 'G', 'G', 'F', 'F'),
                 points=c(99, 68, 86, 88, 95, 74, 78, 93),
                 assists=c(22, 28, 45, 35, 34, 45, 28, 31))

#view data table
dt

   team position points assists
1:    A        G     99      22
2:    A        G     68      28
3:    A        F     86      45
4:    A        F     88      35
5:    B        G     95      34
6:    B        G     74      45
7:    B        F     78      28
8:    B        F     93      31

The data.table contains the following columns:

  • team: The team name a player belongs to
  • position: The position of the player
  • points: The total points scored by the player
  • assists: The total assists made by the player

Example 1: Select Columns of data.table by Name

Suppose that we would like to select the team and assists columns from the data.table by referencing these column names directly.

We can use the following syntax to do so:

library(data.table)

#select team and assists columns
dt[, c('team', 'assists')]

   team assists
1:    A      22
2:    A      28
3:    A      45
4:    A      35
5:    B      34
6:    B      45
7:    B      28
8:    B      31

Notice that this returns all values from the team and assists columns in the data.table.

It’s worth noting that the order in which you specify the column names is the order in which the columns will be returned.

For example, we could use the following syntax to instead return the assists column first followed by the team column:

library(data.table)

#select assists and team columns
dt[, c('assists', 'team')]

   assists team
1:      22    A
2:      28    A
3:      45    A
4:      35    A
5:      34    B
6:      45    B
7:      28    B
8:      31    B

This returns all values from the assists and team columns in the data.table, in that particular order.

Example 2: Select Columns of data.table Based on Vector

Suppose that we would like to specify a vector of column names and then pass this vector to the data.table to subset on.

We can use the following syntax to do so:

library(data.table)

#specify columns to select
my_columns <- c('team', 'assists', 'points')

#subset data.table based on column names in vector
dt[, ..my_columns]

   team assists points
1:    A      22     99
2:    A      28     68
3:    A      45     86
4:    A      35     88
5:    B      34     95
6:    B      45     74
7:    B      28     78
8:    B      31     93

This returns the team, assists and points columns from the data.table, just as we specified in the vector named my_columns.

Once again, note that the order of column names you use in the vector will be the order in which the columns are returned.

Additional Resource

The following tutorials explain how to perform other common tasks in R:

How to Filter a data.table in R
How to Sort a data.table in R
How to Group data.table by Multiple Columns in R
How to Use dcast Function from data.table in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *