This tutorial explains how to filter a data structure in R using indexing.

**Indexing Vectors**

Suppose we have a vector of ten elements:

#create a vectorxof 10 elements x <- c(12, 16, 25, 34, 23, 1, 7, 23, 6, 3)

The following code illustrates how to retrieve various elements from the vector **based on their position in the vector**:

#return third element in the vector x[3] #[1] 25 #return tenth element in the vector x[10] #[1] 3 #return first three elements in the vector x[1:3] #[1] 12 16 25 #return first, third, and fifth elements in the vector x[c(1, 3, 5)] #[1] 12 25 23

The following code illustrates how to retrieve various elements from the vector **based on boolean values**:

#create a vectorxof 10 elements x <- c(12, 16, 25, 34, 23, 1, 7, 23, 6, 3) #find out if each value in vectorxis greater than 10 x > 10 #[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE #return the values in vectorxthat are greater than 10 x[x>10] #[1] 12 16 25 34 23 23 #return the values in vectorxthat are less than 10 x[x<10] #[1] 1 7 6 3 #return the values in vectorxthat are greater than 10andless than 20 x[x>10 & x<20] #[1] 12 16 #return the values in vectorxthat are less than 10orgreater than 20 x[x<10 | x>20] #[1] 25 34 23 1 7 23 6 3

The following code illustrates how to retrieve various elements from the vector **based on negative indexing**:

#create a vectorxof 10 elements x <- c(12, 16, 25, 34, 23, 1, 7, 23, 6, 3) #return every element in vectorx exceptfor the first element x[-1] #[1] 16 25 34 23 1 7 23 6 3 #return every element in vectorx exceptfor the first four elements x[-1:-4] #[1] 23 1 7 23 6 3 #return every element in vectorx exceptelements in positions 1, 4, and 7 x[c(-1, -4, -7)] #[1] 16 25 23 1 23 6 3 #return every element in vectorx exceptfor the last element x[-length(x)] #[1] 12 16 25 34 23 1 7 23 6

**Indexing Data Frames**

Suppose we have the following data frame:

#create a data frame with three columns and five rows data <- data.frame(Name = c('Michael', 'Dwight', 'Andy', 'Jim', 'Stanley'), Sales = c(12, 35, 22, 15, 18), Hours = c(50, 55, 40, 30, 40)) head(data) # Name Sales Hours #1 Michael 12 50 #2 Dwight 35 55 #3 Andy 22 40 #4 Jim 15 30 #5 Stanley 18 40

The following code illustrates how to retrieve rows and columns from the data frame **based on row and column position**:

#return the element at row 4, column 3 data[4, 3] #[1] 30 #return the element at row 4, column 'Hours' data[4, 'Hours'] #[1] 30 #return the elements in row 2, all columns data[2, ] # Name Sales Hours #2 Dwight 35 55 #return the elements in rows 2 and 4, all columns data[c(2, 4), ] # Name Sales Hours #2 Dwight 35 55 #4 Jim 15 30 #return the elements in rows 2 and 4, but only column 3 data[c(2, 4), 3] #[1] 55 30 #return the elements in rows 2 and 4, but only columns 1 and 3 data[c(2, 4), c(1,3)] # Name Hours #2 Dwight 55 #4 Jim 30 #return the elements in rows 2 and 4, but only columns 'Name' and 'Hours' data[c(2, 4), c('Name', 'Hours')] # Name Hours #2 Dwight 55 #4 Jim 30

The following code illustrates how to retrieve various elements from the data frame **based on boolean values**:

#find out if each value in column 'Hours' is greater than 40 data$Hours > 40 #[1] TRUE TRUE FALSE FALSE FALSE #return all rows in data frame where 'Hours' is greater than 40 data[data$Hours>40, ] # Name Sales Hours 1 Michael 12 50 2 Dwight 35 55 #return all rows in data frame where 'Hours' is greater than 30andless than 50 data[data$Hours>30 & data$Hours<50, ] # Name Sales Hours #3 Andy 22 40 #5 Stanley 18 40 #return all rows where name is equal to 'Andy' or 'Stanley' data[data$Name %in% c('Andy', 'Stanley'), ] # Name Sales Hours #3 Andy 22 40 #5 Stanley 18 40 #return number of hours worked for 'Andy' and 'Stanley' data[data$Name %in% c('Andy', 'Stanley'), 'Hours'] #[1] 40 40 #return number of hours worked (column 3) for 'Andy' and 'Stanley' data[data$Name %in% c('Andy', 'Stanley'), 3] #[1] 40 40

The following code illustrates how to retrieve various elements from the data frame **based on negative indexing**:

#Return all rows in data frameexceptfor the first row data[-1, ] # Name Sales Hours #2 Dwight 35 55 #3 Andy 22 40 #4 Jim 15 30 #5 Stanley 18 40 #Return all elements in data frameexceptfor the first row and first column data[-1, -1] # Sales Hours #2 35 55 #3 22 40 #4 15 30 #5 18 40 #Return all elements in data frameexceptfor the first row and 'Name' column data[-1, c('Sales', 'Hours')] # Sales Hours #2 35 55 #3 22 40 #4 15 30 #5 18 40