How to Use split() Function in R to Split Data


The split() function in R can be used to split data into groups based on factor levels.

This function uses the following basic syntax:

split(x, f, …)

where:

  • x: Name of the vector or data frame to divide into groups
  • f: A factor that defines the groupings

The following examples show how to use this function to split vectors and data frames into groups.

Example 1: Use split() to Split Vector Into Groups

The following code shows how to split a vector of data values into groups based on a vector of factor levels:

#create vector of data values
data <- c(1, 2, 3, 4, 5, 6)

#create vector of groupings
groups <- c('A', 'B', 'B', 'B', 'C', 'C')

#split vector of data values into groups
split(x = data, f = groups)

$A
[1] 1

$B
[1] 2 3 4

$C
[1] 5 6

The result is three groups.

Note that you can use indexing to retrieve specific groups as well:

#split vector of data values into groups and only display second group
split(x = data, f = groups)[2]

$B
[1] 2 3 4

Example 2: Use split() to Split Data Frame Into Groups

Suppose we have the following data frame in R:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'),
                 position=c('G', 'G', 'F', 'G', 'F', 'F'),
                 points=c(33, 28, 31, 39, 34, 44),
                 assists=c(30, 28, 24, 24, 28, 19))

#view data frame
df

  team position points assists
1    A        G     33      30
2    A        G     28      28
3    A        F     31      24
4    B        G     39      24
5    B        F     34      28
6    B        F     44      19

We can use the following code to split the data frame into groups based on the ‘team’ variable:

#split data frame into groups based on 'team'
split(df, f = df$team)

$A
  team position points assists
1    A        G     33      30
2    A        G     28      28
3    A        F     31      24

$B
  team position points assists
4    B        G     39      24
5    B        F     34      28
6    B        F     44      19

The result is two groups. The first contains only rows where ‘team’ is equal to A and the second contains only rows where ‘team’ is equal to B.

Note that we can also split the data into groups using multiple factor variables. For example, the following code shows how to split the data into groups based on the ‘team’ and ‘position’ variables:

#split data frame into groups based on 'team' and 'position' variables
split(df, f = list(df$team, df$position))

$A.F
  team position points assists
3    A        F     31      24

$B.F
  team position points assists
5    B        F     34      28
6    B        F     44      19

$A.G
  team position points assists
1    A        G     33      30
2    A        G     28      28

$B.G
  team position points assists
4    B        G     39      24

The result is four groups.

Additional Resources

The following tutorials explain how to use other common functions in R:

How to Use summary() Function in R
How to Use the replicate() Function in R
How to Use match() Function in R

Leave a Reply

Your email address will not be published.