R: How to Subset data.table Based on Multiple Conditions


Often you may want to subset a data.table in R based on multiple conditions.

You can use the following methods to do so:

Method 1: Subset data.table Where Multiple Conditions are Met

dt[team=='A' & assists>30]

This particular example will select all rows from the data.table named dt where the team column is equal to ‘A’ and the assists column has a value greater than 30.

Note that the & symbol represents “AND” logic.

Method 2: Subset data.table Where At Least One Condition is Met

dt[team=='A' | assists>30] 

This particular example will select all rows from the data.table named dt where the team column is equal to ‘A’ or the assists column has a value greater than 30.

Note that the | symbol represents “OR” logic.

The following example shows how to use each of these methods in practice with the following data.table in R:

library(data.table)

#create data table
dt <- data.table(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 position=c('G', 'G', 'F', 'F', 'G', 'G', 'F', 'F'),
                 points=c(99, 68, 86, 88, 95, 74, 78, 93),
                 assists=c(22, 28, 45, 35, 34, 45, 28, 31))

#view data table
dt

   team position points assists
1:    A        G     99      22
2:    A        G     68      28
3:    A        F     86      45
4:    A        F     88      35
5:    B        G     95      34
6:    B        G     74      45
7:    B        F     78      28
8:    B        F     93      31

The data.table contains the following columns:

  • team: The team name a player belongs to
  • position: The position of the player
  • points: The total points scored by the player
  • assists: The total assists made by the player

Example 1: Subset data.table Where Multiple Conditions are Met

Suppose that we would like to select all rows in the data.table where the team column is equal to ‘A’ and the assists column has a value greater than 30.

We can use the following syntax to do so:

library(data.table)

#select rows where team is 'A' and assists is greater than 30
dt[team=='A' & assists>30] 

This returns the two rows from the data.table that meet both of the conditions that we specified.

Note that you can use the & symbol as many times as you would like to filter for rows that meet even more conditions.

Example 2: Subset data.table Where At Least One Condition is Met

Suppose that we would like to select all rows in the data.table where the team column is equal to ‘A’ or the assists column has a value greater than 30.

We can use the following syntax to do so:

library(data.table)

#select rows where team is 'A' or assists is greater than 30
dt[team=='A' | assists>30]

   team position points assists
1:    A        G     99      22
2:    A        G     68      28
3:    A        F     86      45
4:    A        F     88      35
5:    B        G     95      34
6:    B        G     74      45
7:    B        F     93      31

This returns all rows from the data.table that meet at least one of the conditions that we specified.

Note that only one row in the data.table did not meet at least one of these conditions.

Note that you can use the | symbol as many times as you would like to filter based on even more potential conditions.

Additional Resource

The following tutorials explain how to perform other common tasks in R:

How to Filter a data.table in R
How to Sort a data.table in R
How to Group data.table by Multiple Columns in R
How to Use dcast Function from data.table in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *