Often you may want to calculate a cross-tabulation table in R to summarize the relationship between two categorical variables.

Fortunately this is easy to do by using the **CrossTable()** function from the **gmodels **package in R, which is designed to perform this exact task.

The **CrossTable****()** function uses the following basic syntax:

**CrossTable(x, y, digits=3, …)**

where:

**x**: The name of the first vector**y**: The name of the second vector**digits**: The number of digits to display after the decimal point for the cell proportions

The following example shows how to use the **CrossTable****()** function from the **gmodels **package in practice.

**Note**: Before using the **CrossTable****()** function, you may need to first install the **gmodels **package by using the following syntax:

install.packages('gmodels')

Once the **gmodels **package is installed, you can use the **CrossTable****()** function.

**Example: How to Use the CrossTable() Function in R**

Suppose that we want to know whether or not gender is associated with political party preference.

To test this, we take a simple random sample of 20 voters and survey them on their political party preference.

We can create the following data frame to hold the results of the survey:

#create data frame df <- data.frame(gen=rep(c('M', 'F'), each=10), pol=c('D', 'D', 'D', 'D', 'R', 'R', 'I', 'I', 'I', 'D', 'I', 'R', 'R', 'D', 'D', 'D', 'D', 'D', 'R', 'I')) #view data frame df gen pol 1 M D 2 M D 3 M D 4 M D 5 M R 6 M R 7 M I 8 M I 9 M I 10 M D 11 F I 12 F R 13 F R 14 F D 15 F D 16 F D 17 F D 18 F D 19 F R 20 F I

The **gen** column contains the gender of the survey respondent (M = Male, F = Female) and the **pol** column contains the political preference of the survey respondent (D = Democrat, I = Independent, R= Republican).

Suppose that we would like to perform a cross-tabulation to summarize the frequencies of each variable in this data frame.

We can use the **CrossTable()** function from the **gmodels** package to do so:

library(gmodels) #perform cross-tabulation of gender and political preference CrossTable(x=df$gen, y=df$pol) Cell Contents |-------------------------| | N | | Chi-square contribution | | N / Row Total | | N / Col Total | | N / Table Total | |-------------------------| Total Observations in Table: 20 | df$pol df$gen | D | I | R | Row Total | -------------|-----------|-----------|-----------|-----------| F | 5 | 2 | 3 | 10 | | 0.000 | 0.100 | 0.100 | | | 0.500 | 0.200 | 0.300 | 0.500 | | 0.500 | 0.400 | 0.600 | | | 0.250 | 0.100 | 0.150 | | -------------|-----------|-----------|-----------|-----------| M | 5 | 3 | 2 | 10 | | 0.000 | 0.100 | 0.100 | | | 0.500 | 0.300 | 0.200 | 0.500 | | 0.500 | 0.600 | 0.400 | | | 0.250 | 0.150 | 0.100 | | -------------|-----------|-----------|-----------|-----------| Column Total | 10 | 5 | 5 | 20 | | 0.500 | 0.250 | 0.250 | | -------------|-----------|-----------|-----------|-----------|

The output shows the cross-tabulation of the categorical variables.

The first value in each cell shows the total frequency while the next values show the relative proportions.

For example, consider the first cell in the top right corner. This cell shows the frequencies of Female Democrats.

We can see:

- There are
**5**total Female Democrats. - The contribution to the Chi-Square statistic is
**0**. - Female Democrats account for
**50%**of all Females. - Female Democrats account for
**50%**of all Democrats. - Female Democrats account for
**25%**of all individuals.

Each cell in the table can be interpreted in a similar manner.

**Additional Resources**

The following tutorials explain how to perform other common tasks in R:

How to Use slice_min() in dplyr

How to Use the pull() Function in dplyr

How to Use top_n() in dplyr

How to Rename Columns Using dplyr