How to Create a Population Pyramid in R

How to create a population pyramid in R

This tutorial explains how to easily create a population pyramid in R.

What is a Population Pyramid?

A population pyramid is a graph that shows the age and gender distribution of a given population. It is a useful chart for easily understanding the make-up of a population as well as the current trend in population growth.

If a population pyramid has a rectangular shape, it’s an indication that a population is growing at a slower rate; older generations are being replaced by new generations of roughly the same size.

If a population pyramid has a pyramid shape, it’s an indication that a population is growing at a faster rate; older generations are producing larger new generations.

Within the chart, the gender is shown on the left and right sides, the age is shown on the y-axis, and the percentage or amount of the population is shown on the x-axis.

Let’s walk through an example of how to create a population pyramid in R.

Creating a Population Pyramid in R

Suppose we have the following dataset that shows the percentage make-up of a population according to age (0 to 100 years) and gender(M = “Male”, F = “Female”):

#make this example reproducible
set.seed(1)

#create data frame
data <- data.frame(age = rep(1:100, 2), gender = rep(c("M", "F"), each = 100))

#add population variable
data$population <- 1/sqrt(data$age) * runif(200, 10000, 15000)

#convert population variable to percentage
data$population <- data$population / sum(data$population) * 100

#view first six rows of dataset
head(data)

#  age gender population
#1   1      M   2.424362
#2   2      M   1.794957
#3   3      M   1.589594
#4   4      M   1.556063
#5   5      M   1.053662
#6   6      M   1.266231

#view last six rows of dataset
tail(data)

#    age gender population
#195  95      F  0.2506803
#196  96      F  0.2829385
#197  97      F  0.2292992
#198  98      F  0.3070539
#199  99      F  0.2492992
#200 100      F  0.2977980

We can create a basic population pyramid for this dataset using the ggplot2 library:

#load ggplot2
library(ggplot2)

#create population pyramid
ggplot(data, aes(x = age, fill = gender,
                 y = ifelse(test = gender == "M",
                            yes = -population, no = population))) + 
  geom_bar(stat = "identity") +
  scale_y_continuous(labels = abs, limits = max(data$population) * c(-1,1)) +
  coord_flip()

Population pyramid using ggplot2

Modifying the Aesthetics of a Population Pyramid in R

We can also modify the aesthetics of the plot to add titles, axis labels, axis ticks, colors, and more.

Adding Titles & Labels

We can add both titles and axis labels to the population pyramid using the labs() argument:

ggplot(data, aes(x = age, fill = gender,
                 y = ifelse(test = gender == "M",
                            yes = -population, no = population))) + 
  geom_bar(stat = "identity") +
  scale_y_continuous(labels = abs, limits = max(data$population) * c(-1,1)) +
  labs(title = "Population Pyramid", x = "Age", y = "Percent of population") +
  coord_flip()

Population pyramid in R using ggplot2

Modifying the Colors

We can modify the two colors used to represent the genders by using the scale_colour_manual() argument:

ggplot(data, aes(x = age, fill = gender,
                 y = ifelse(test = gender == "M",
                            yes = -population, no = population))) + 
  geom_bar(stat = "identity") +
  scale_y_continuous(labels = abs, limits = max(data$population) * c(-1,1)) +
  labs(title = "Population Pyramid", x = "Age", y = "Percent of population") +
  scale_colour_manual(values = c("pink", "steelblue"),
                      aesthetics = c("colour", "fill")) +
  coord_flip()

Population pyramid in R with custom colors

Multiple Population Pyramids

It’s also possible to plot several population pyramids together using the facet_wrap() argument. For example, suppose we have demographic data for countries A, B, and C. The following code illustrates how to create one population pyramid for each country:

#make this example reproducible
set.seed(1)

#create data frame
data_multiple <- data.frame(age = rep(1:100, 6),
                   gender = rep(c("M", "F"), each = 300),
                   country = rep(c("A", "B", "C"), each = 100, times = 2))

#add population variable
data_multiple$population <- round(1/sqrt(data_multiple$age)*runif(200, 10000, 15000), 0)

#view first six rows of dataset
head(data_multiple)

#  age gender country population
#1   1      M       A      11328
#2   2      M       A       8387
#3   3      M       A       7427
#4   4      M       A       7271
#5   5      M       A       4923
#6   6      M       A       5916
#view last six rows of dataset
tail(data_multiple)

#    age gender country population
#595  95      F       C       1171
#596  96      F       C       1322
#597  97      F       C       1071
#598  98      F       C       1435
#599  99      F       C       1165
#600 100      F       C       1391

#create one population pyramid per country
ggplot(data_multiple, aes(x = age, fill = gender,
                          y = ifelse(test = gender == "M",
                                     yes = -population, no = population))) + 
  geom_bar(stat = "identity") +
  scale_y_continuous(labels = abs, limits = max(data_multiple$population) * c(-1,1)) +
  labs(y = "Population Amount") + 
  coord_flip() +
  facet_wrap(~ country) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) #rotate x-axis labels

Population pyramids in R with facet_wrap()

Modifying the Theme

Lastly, we can modify the theme of the charts. For example, the following code uses theme_classic() to give the charts a more minimalist look:

ggplot(data_multiple, aes(x = age, fill = gender,
                          y = ifelse(test = gender == "M",
                                     yes = -population, no = population))) + 
  geom_bar(stat = "identity") +
  scale_y_continuous(labels = abs, limits = max(data_multiple$population) * c(-1,1)) +
  labs(y = "Population Amount") + 
  coord_flip() +
  facet_wrap(~ country) +
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Classic theme in R

Or you can use custom ggthemes. For a complete list of ggthemes, check out the documentation page.

Leave a Reply

Your email address will not be published. Required fields are marked *