Filtering a Data Structure in R Using Indexing

Filtering data using indexing in R

This tutorial explains how to filter a data structure in R using indexing.

Indexing Vectors

Suppose we have a vector of ten elements:

#create a vector x of 10 elements
x <- c(12, 16, 25, 34, 23, 1, 7, 23, 6, 3)

The following code illustrates how to retrieve various elements from the vector based on their position in the vector:

#return third element in the vector
x[3]

#[1] 25

#return tenth element in the vector
x[10]

#[1] 3

#return first three elements in the vector
x[1:3]

#[1] 12 16 25

#return first, third, and fifth elements in the vector
x[c(1, 3, 5)]

#[1] 12 25 23

The following code illustrates how to retrieve various elements from the vector based on boolean values:

#create a vector x of 10 elements
x <- c(12, 16, 25, 34, 23, 1, 7, 23, 6, 3)

#find out if each value in vector x is greater than 10
x > 10

#[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE

#return the values in vector x that are greater than 10
x[x>10]

#[1] 12 16 25 34 23 23

#return the values in vector x that are less than 10
x[x<10]

#[1] 1 7 6 3

#return the values in vector x that are greater than 10 and less than 20
x[x>10 & x<20]

#[1] 12 16

#return the values in vector x that are less than 10 or greater than 20
x[x<10 | x>20]

#[1] 25 34 23 1 7 23 6 3

The following code illustrates how to retrieve various elements from the vector based on negative indexing:

#create a vector x of 10 elements
x <- c(12, 16, 25, 34, 23, 1, 7, 23, 6, 3)

#return every element in vector x except for the first element
x[-1]

#[1] 16 25 34 23 1 7 23 6 3

#return every element in vector x except for the first four elements
x[-1:-4]

#[1] 23 1 7 23 6 3

#return every element in vector x except elements in positions 1, 4, and 7
x[c(-1, -4, -7)]

#[1] 16 25 23 1 23 6 3

#return every element in vector x except for the last element
x[-length(x)]

#[1] 12 16 25 34 23 1 7 23 6

Indexing Data Frames

Suppose we have the following data frame:

#create a data frame with three columns and five rows
data <- data.frame(Name = c('Michael', 'Dwight', 'Andy', 'Jim', 'Stanley'),
          Sales = c(12, 35, 22, 15, 18),
          Hours = c(50, 55, 40, 30, 40))
head(data)

#     Name Sales Hours
#1 Michael    12    50
#2  Dwight    35    55
#3    Andy    22    40
#4     Jim    15    30
#5 Stanley    18    40

The following code illustrates how to retrieve rows and columns from the data frame based on row and column position:

#return the element at row 4, column 3
data[4, 3]

#[1] 30

#return the element at row 4, column 'Hours'
data[4, 'Hours']

#[1] 30

#return the elements in row 2, all columns
data[2, ]

#    Name Sales Hours
#2 Dwight    35    55

#return the elements in rows 2 and 4, all columns
data[c(2, 4), ]

#    Name Sales Hours
#2 Dwight    35    55
#4    Jim    15    30

#return the elements in rows 2 and 4, but only column 3
data[c(2, 4), 3]

#[1] 55 30

#return the elements in rows 2 and 4, but only columns 1 and 3
data[c(2, 4), c(1,3)]

#    Name Hours
#2 Dwight    55
#4    Jim    30

#return the elements in rows 2 and 4, but only columns 'Name' and 'Hours'
data[c(2, 4), c('Name', 'Hours')]

#    Name Hours
#2 Dwight    55
#4    Jim    30

The following code illustrates how to retrieve various elements from the data frame based on boolean values:

#find out if each value in column 'Hours' is greater than 40
data$Hours > 40

#[1] TRUE TRUE FALSE FALSE FALSE

#return all rows in data frame where 'Hours' is greater than 40
data[data$Hours>40, ]

# Name Sales Hours
1 Michael 12 50
2 Dwight 35 55

#return all rows in data frame where 'Hours' is greater than 30 and less than 50
data[data$Hours>30 & data$Hours<50, ]

#     Name Sales Hours
#3    Andy    22    40
#5 Stanley    18    40

#return all rows where name is equal to 'Andy' or 'Stanley'
data[data$Name %in% c('Andy', 'Stanley'), ]

#     Name Sales Hours
#3    Andy    22    40
#5 Stanley    18    40

#return number of hours worked for 'Andy' and 'Stanley'
data[data$Name %in% c('Andy', 'Stanley'), 'Hours']

#[1] 40 40

#return number of hours worked (column 3) for 'Andy' and 'Stanley'
data[data$Name %in% c('Andy', 'Stanley'), 3]

#[1] 40 40

The following code illustrates how to retrieve various elements from the data frame based on negative indexing:

#Return all rows in data frame except for the first row
data[-1, ]

#     Name Sales Hours
#2  Dwight    35    55
#3    Andy    22    40
#4     Jim    15    30
#5 Stanley    18    40

#Return all elements in data frame except for the first row and first column
data[-1, -1]

#  Sales Hours
#2    35    55
#3    22    40
#4    15    30
#5    18    40

#Return all elements in data frame except for the first row and 'Name' column
data[-1, c('Sales', 'Hours')]

# Sales Hours
#2 35 55
#3 22 40
#4 15 30
#5 18 40

Leave a Reply

Your email address will not be published. Required fields are marked *