A Complete Guide to the Built-in Datasets in R

The R programming language comes with several built-in datasets that are useful for practicing building models, summarizing datasets, and creating visualizations.

You can find a complete list of available built-in datasets by typing the following into your R console:


There are over 50 built-in datasets but some of the most popular ones include:

  • iris: A dataset that contains measurements on 4 different attributes (in centimeters) for 50 flowers from 3 different species.
  • mtcars: A dataset in R that contains measurements on 11 different attributes for 32 different cars.
  • airquality: A dataset that contains air quality measurements in New York City from 1973 with 154 observations and 6 variables.
  • AirPassengers: A dataset that contains the number of monthly airline passengers from 1949 to 1960.

The following example explains how to gain a quick understanding of any of these datasets by using the iris dataset as an example.

Example: How to Analyze a Built-in Dataset in R

One of the easiest ways to gain a quick understanding of a built-in dataset is by using the head function, which allows you to view the first six rows of the dataset.

#view first six rows of iris dataset

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

You can also use the summary function to quickly summarize each variable in the dataset:

#summarize iris dataset

  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
 setosa    :50  
 virginica :50  

For each of the numeric variables we can see the following information:

  • Min: The minimum value.
  • 1st Qu: The value of the first quartile (25th percentile).
  • Median: The median value.
  • Mean: The mean value.
  • 3rd Qu: The value of the third quartile (75th percentile).
  • Max: The maximum value.

For the only categorical variable in the dataset (Species) we see a frequency count of each value:

  • setosa: This species occurs 50 times.
  • versicolor: This species occurs 50 times.
  • virginica: This species occurs 50 times.

You can also use the dim function to get the dimensions of the dataset in terms of number of rows and number of columns:

#display rows and columns

[1] 150   5

We can see that the dataset has 150 rows and 5 columns.

We can also create some plots to visualize the values in the dataset.

For example, we can use the hist() function to create a histogram of the values for a certain variable:

#create histogram of values for sepal length

This histogram allows us to visualize the distribution of values for the Sepal.Length variable.

Feel free to use each of the functions shown here to explore any of the built-in datasets in R that you’d like.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Create Summary Tables in R
How to Calculate Five Number Summary in R
How to Calculate Descriptive Statistics in R

Leave a Reply

Your email address will not be published. Required fields are marked *