How to Get the Subset of a Data Structure in R Using subset()

How to get a subset in R using the subset() function

This tutorial explains how to get the subset of a data structure in R using the subset() function.

Subsetting in R

The subset() function in R offers a simple way to get a subset of a data structure using the following syntax:

subset(data frame name, rows you want, columns you want)

Suppose we have the following data frame with three columns and five rows:

#create data frame with three columns and five rows
data <- data.frame(Name = c('Michael', 'Dwight', 'Andy', 'Jim', 'Stanley'),
Sales = c(12, 35, 22, 15, 18),
Hours = c(50, 55, 40, 30, 40))
head(data)

#     Name Sales Hours
#1 Michael    12    50
#2  Dwight    35    55
#3    Andy    22    40
#4     Jim    15    30
#5 Stanley    18    40

The following code illustrates how to get a subset of the data frame using a variety of different methods.

#get all rows where 'Sales' is greater than 20
subset(data, Sales > 20)

#    Name Sales Hours
#2 Dwight    35    55
#3   Andy    22    40

#get all rows where 'Sales' is greater than 20, and get 'Name' and 'Hours' columns
subset(data, Sales > 20, select = c('Name', 'Hours'))

#    Name Hours
#2 Dwight    55
#3   Andy    40

#get all rows where 'Sales' is greater than 20, leave out 'Sales' column
subset(data, Sales > 20, select = -Sales)

# Name Hours
#2 Dwight 55
#3 Andy 40

#get all rows where 'Sales' is greater than 15 and less than 30
subset(data, Sales > 15 & Sales < 30)

#     Name Sales Hours
#3    Andy    22    40
#5 Stanley    18    40

#get all rows where 'Sales' is less than 15 or greater than 30
subset(data, Sales < 15 | Sales > 30)

#     Name Sales Hours
#1 Michael    12    50
#2  Dwight    35    55

#get all rows where 'Sales' is less than 15 or greater than 30, leave out 'Hours'
subset(data, Sales < 15 | Sales > 30, select = -Hours)

#  Name Sales
#1 Michael 12
#2 Dwight  35

#get all rows where name is equal to 'Michael' or 'Dwight'
subset(data, Name %in% c('Michael', 'Dwight'))

#     Name Sales Hours
#1 Michael    12    50
#2  Dwight    35    55

#get all rows where name is NOT equal to 'Michael' or 'Dwight'
subset(data, !(Name %in% c('Michael', 'Dwight')))

#     Name Sales Hours
#3    Andy    22    40
#4     Jim    15    30
#5 Stanley    18    40

#get all rows where 'Hours' equals 40, leave out 'Name' column
subset(data, Hours == 40, select = -Name)

# Sales Hours
#3   22    40
#5   18    40

hey

Leave a Reply

Your email address will not be published. Required fields are marked *