How to Use the CrossTable() Function in R


Often you may want to calculate a cross-tabulation table in R to summarize the relationship between two categorical variables.

Fortunately this is easy to do by using the CrossTable() function from the gmodels package in R, which is designed to perform this exact task.

The CrossTable() function uses the following basic syntax:

CrossTable(x, y, digits=3, …)

where:

  • x: The name of the first vector
  • y: The name of the second vector
  • digits: The number of digits to display after the decimal point for the cell proportions

The following example shows how to use the CrossTable() function from the gmodels package in practice.

Note: Before using the CrossTable() function, you may need to first install the gmodels package by using the following syntax:

install.packages('gmodels')

Once the gmodels package is installed, you can use the CrossTable() function.

Example: How to Use the CrossTable() Function in R

Suppose that we want to know whether or not gender is associated with political party preference.

To test this, we take a simple random sample of 20 voters and survey them on their political party preference.

We can create the following data frame to hold the results of the survey:

#create data frame
df <- data.frame(gen=rep(c('M', 'F'), each=10),
                 pol=c('D', 'D', 'D', 'D', 'R', 'R', 'I', 'I', 'I', 'D',
                       'I', 'R', 'R', 'D', 'D', 'D', 'D', 'D', 'R', 'I'))

#view data frame
df

   gen pol
1    M   D
2    M   D
3    M   D
4    M   D
5    M   R
6    M   R
7    M   I
8    M   I
9    M   I
10   M   D
11   F   I
12   F   R
13   F   R
14   F   D
15   F   D
16   F   D
17   F   D
18   F   D
19   F   R
20   F   I

The gen column contains the gender of the survey respondent (M = Male, F = Female) and the pol column contains the political preference of the survey respondent (D = Democrat, I = Independent, R= Republican).

Suppose that we would like to perform a cross-tabulation to summarize the frequencies of each variable in this data frame.

We can use the CrossTable() function from the gmodels package to do so:

library(gmodels)

#perform cross-tabulation of gender and political preference
CrossTable(x=df$gen, y=df$pol)

   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  20 

 
             | df$pol 
      df$gen |         D |         I |         R | Row Total | 
-------------|-----------|-----------|-----------|-----------|
           F |         5 |         2 |         3 |        10 | 
             |     0.000 |     0.100 |     0.100 |           | 
             |     0.500 |     0.200 |     0.300 |     0.500 | 
             |     0.500 |     0.400 |     0.600 |           | 
             |     0.250 |     0.100 |     0.150 |           | 
-------------|-----------|-----------|-----------|-----------|
           M |         5 |         3 |         2 |        10 | 
             |     0.000 |     0.100 |     0.100 |           | 
             |     0.500 |     0.300 |     0.200 |     0.500 | 
             |     0.500 |     0.600 |     0.400 |           | 
             |     0.250 |     0.150 |     0.100 |           | 
-------------|-----------|-----------|-----------|-----------|
Column Total |        10 |         5 |         5 |        20 | 
             |     0.500 |     0.250 |     0.250 |           | 
-------------|-----------|-----------|-----------|-----------|

The output shows the cross-tabulation of the categorical variables.

The first value in each cell shows the total frequency while the next values show the relative proportions.

For example, consider the first cell in the top right corner. This cell shows the frequencies of Female Democrats.

We can see:

  • There are 5 total Female Democrats.
  • The contribution to the Chi-Square statistic is 0.
  • Female Democrats account for 50% of all Females.
  • Female Democrats account for 50% of all Democrats.
  • Female Democrats account for 25% of all individuals.

Each cell in the table can be interpreted in a similar manner.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Use slice_min() in dplyr
How to Use the pull() Function in dplyr
How to Use top_n() in dplyr
How to Rename Columns Using dplyr

Leave a Reply

Your email address will not be published. Required fields are marked *