How to Create a Crosstab Using dplyr (With Examples)


You can use the following basic syntax to produce a crosstab using functions from the dplyr and tidyr packages in R:

df %>%
  group_by(var1, var2) %>%
  tally() %>%
  spread(var1, n)

The following examples show how to use this syntax in practice.

Example 1: Create Basic Crosstab

Suppose we have the following data frame in R:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 position=c('G', 'G', 'F', 'C', 'G', 'F', 'F', 'C'),
                 points=c(7, 7, 8, 11, 13, 15, 19, 13))

#view data frame
df

  team position points
1    A        G      7
2    A        G      7
3    A        F      8
4    A        C     11
5    B        G     13
6    B        F     15
7    B        F     19
8    B        C     13

We can use the following syntax to create a crosstab for the ‘team’ and ‘position’ variables:

library(dplyr)
library(tidyr)

#produce crosstab 
df %>%
  group_by(team, position) %>%
  tally() %>%
  spread(team, n)

# A tibble: 3 x 3
  position     A     B
1 C            1     1
2 F            1     2
3 G            2     1

Here’s how to interpret the values in the crosstab:

  • There is 1 player who has a position of ‘C’ and belongs to team ‘A’
  • There is 1 player who has a position of ‘C’ and belongs to team ‘B’
  • There is 1 player who has a position of ‘F’ and belongs to team ‘A’
  • There are 2 players who have a position of ‘F’ and belong to team ‘B’
  • There are 2 players who have a position of ‘G’ and belong to team ‘A’
  • There is 1 player who has a position of ‘G’ and belongs to team ‘B’

Note that we can switch the rows and columns of the crosstab by switching the variable used in the spread() function:

library(dplyr)
library(tidyr)

#produce crosstab with 'position' along columns
df %>%
  group_by(team, position) %>%
  tally() %>%
  spread(position, n)

# A tibble: 2 x 4
# Groups:   team [2]
  team      C     F     G     
1 A         1     1     2
2 B         1     2     1

Related: How to Use Spread Function in tidyr

Additional Resources

The following tutorials explain how to perform other common functions in dplyr:

How to Calculate Relative Frequencies Using dplyr
How to Select Columns by Index Using dplyr
How to Remove Rows Using dplyr

Leave a Reply

Your email address will not be published.