This tutorial explains how to get the subset of a data structure in R using the **subset()** function.

**Subsetting in R**

The **subset()** function in R offers a simple way to get a subset of a data structure using the following syntax:

**subset(data frame name, rows you want, columns you want)**

Suppose we have the following data frame with three columns and five rows:

#create data frame with three columns and five rows data <- data.frame(Name = c('Michael', 'Dwight', 'Andy', 'Jim', 'Stanley'), Sales = c(12, 35, 22, 15, 18), Hours = c(50, 55, 40, 30, 40)) head(data) # Name Sales Hours #1 Michael 12 50 #2 Dwight 35 55 #3 Andy 22 40 #4 Jim 15 30 #5 Stanley 18 40

The following code illustrates how to get a subset of the data frame using a variety of different methods.

#get all rows where 'Sales' is greater than 20 subset(data, Sales > 20) # Name Sales Hours #2 Dwight 35 55 #3 Andy 22 40 #get all rows where 'Sales' is greater than 20, and get 'Name' and 'Hours' columns subset(data, Sales > 20, select = c('Name', 'Hours')) # Name Hours #2 Dwight 55 #3 Andy 40 #get all rows where 'Sales' is greater than 20, leave out 'Sales' column subset(data, Sales > 20, select = -Sales) # Name Hours #2 Dwight 55 #3 Andy 40 #get all rows where 'Sales' is greater than 15andless than 30 subset(data, Sales > 15 & Sales < 30) # Name Sales Hours #3 Andy 22 40 #5 Stanley 18 40 #get all rows where 'Sales' is less than 15orgreater than 30 subset(data, Sales < 15 | Sales > 30) # Name Sales Hours #1 Michael 12 50 #2 Dwight 35 55 #get all rows where 'Sales' is less than 15orgreater than 30, leave out 'Hours' subset(data, Sales < 15 | Sales > 30, select = -Hours) # Name Sales #1 Michael 12 #2 Dwight 35 #get all rows where name is equal to 'Michael' or 'Dwight' subset(data, Name %in% c('Michael', 'Dwight')) # Name Sales Hours #1 Michael 12 50 #2 Dwight 35 55 #get all rows where name is NOT equal to 'Michael' or 'Dwight' subset(data, !(Name %in% c('Michael', 'Dwight'))) # Name Sales Hours #3 Andy 22 40 #4 Jim 15 30 #5 Stanley 18 40 #get all rows where 'Hours' equals 40, leave out 'Name' column subset(data, Hours == 40, select = -Name) # Sales Hours #3 22 40 #5 18 40

hey