The R programming language comes with several built-in datasets that are useful for practicing building models, summarizing datasets, and creating visualizations.

You can find a complete list of available built-in datasets by typing the following into your R console:

library(help='datasets')

There are over 50 built-in datasets but some of the most popular ones include:

**iris**: A dataset that contains measurements on 4 different attributes (in centimeters) for 50 flowers from 3 different species.**mtcars**: A dataset in R that contains measurements on 11 different attributes for 32 different cars.**airquality**: A dataset that contains air quality measurements in New York City from 1973 with 154 observations and 6 variables.**AirPassengers**: A dataset that contains the number of monthly airline passengers from 1949 to 1960.

The following example explains how to gain a quick understanding of any of these datasets by using the **iris** dataset as an example.

**Example: How to Analyze a Built-in Dataset in R**

One of the easiest ways to gain a quick understanding of a built-in dataset is by using the **head** function, which allows you to view the first six rows of the dataset.

**#view first six rows of iris dataset
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
**

You can also use the **summary** function to quickly summarize each variable in the dataset:

**#summarize iris dataset
summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50 **

For each of the numeric variables we can see the following information:

**Min**: The minimum value.**1st Qu**: The value of the first quartile (25th percentile).**Median**: The median value.**Mean**: The mean value.**3rd Qu**: The value of the third quartile (75th percentile).**Max**: The maximum value.

For the only categorical variable in the dataset (Species) we see a frequency count of each value:

**setosa**: This species occurs 50 times.**versicolor**: This species occurs 50 times.**virginica**: This species occurs 50 times.

You can also use the **dim** function to get the dimensions of the dataset in terms of number of rows and number of columns:

**#display rows and columns
dim(iris)
[1] 150 5
**

We can see that the dataset has **150** rows and **5** columns.

We can also create some plots to visualize the values in the dataset.

For example, we can use the **hist()** function to create a histogram of the values for a certain variable:

**#create histogram of values for sepal length
hist(iris$Sepal.Length,
col='steelblue',
main='Histogram',
xlab='Length',
ylab='Frequency')
**

This histogram allows us to visualize the distribution of values for the **Sepal.Length** variable.

Feel free to use each of the functions shown here to explore any of the built-in datasets in R that you’d like.

**Additional Resources**

The following tutorials explain how to perform other common tasks in R:

How to Create Summary Tables in R

How to Calculate Five Number Summary in R

How to Calculate Descriptive Statistics in R