When we analyze a dataset, we often care about two things:

**1.** Where the “center” value is located. We often measure the “center” using the mean and median.

**2.** How “spread out” the values are. We measure “spread” using **range**, **interquartile range**, **variance, **and **standard deviation**.

**Range**

The **range** is the difference between the largest and smallest value in a dataset.

Suppose we have this dataset of final math exam scores for 20 students:

The largest value is 98. The smallest value is 58. Thus, the range is 98 – 58 = **40**.

**Interquartile Range**

The** interquartile range** is the difference between the first quartile and the third quartile in a dataset.

Quartiles are values that split up a dataset into four equal parts. Here is how to find the interquartile range of the following dataset of exam scores:

**1. Arrange the values from smallest to largest.**

58, 66, 71, 73, 74, 77, 78, 82, 84, 85, 88, 88, 88, 90, 90, 92, 92, 94, 96, 98

**2. Find the median. **(In this case, it’s the average of the middle two values)

58, 66, 71, 73, 74, 77, 78, 82, 84, **85 (MEDIAN) 88**, 88, 88, 90, 90, 92, 92, 94, 96, 98

**3. The median splits the dataset into two halves. The median of the lower half is the lower quartile (Q1) and the median of the upper half is the upper quartile (Q3)**

58, 66, 71, 73, **74, 77**, 78, 82, 84, 85, 88, 88, 88, 90, **90, 92**, 92, 94, 96, 98

**4. The interquartile range is equal to Q3 – Q1.**

In this case, Q1 is the average of the middle two values in the lower half of the data set (75.5) and Q3 is the average of the middle two values in the upper half of the data set(91).

Thus, the interquartile range is 91 – 75.5 = **15.5**

**Interquartile Range vs. Range**

The interquartile range more resistant to outliers compared to the range, which can make it a better metric to use to measure “spread.”

For example, suppose we have the following dataset with incomes for ten people:

The range is $2,468,000, but the interquartile range is $34,000, which is a much better indication of how spread out the incomes actually are.

In this case, the outlier income of person J causes the range to be extremely large and makes it a poor indicator of “spread” for these incomes.

**Variance**

The **variance** is a common way to measure how spread out data values are.

The formula to find the variance of a population (denoted as **σ ^{2}**) is:

**σ ^{2}** = Σ (x

_{i}– μ)

^{2}/ N

where μ is the population mean, x_{i} is the i^{th} element from the population, N is the population size, and Σ is just a fancy symbol that means “sum.”

Usually we work with samples, not populations. And the formula to find the variance of a sample (denoted as **s ^{2}**) is:

**s ^{2}** = Σ (x

_{i}– x)

^{2}/ (n-1)

**Standard Deviation**

The **standard deviation **is the square root of the variance. It’s the most common way to measure how “spread out” data values are.

The formula to find the standard deviation of a population (denoted as **σ** ) is:

√Σ (x_{i} – μ)^{2} / N

And the formula to find the standard deviation of a sample (denoted as **s**) is:

√Σ (x_{i} – x)^{2} / (n-1)

hi，why the interquartile range result is different between the calculate result manually and R result?

> data quantile(data, 0.75) – quantile(data, 0.25)

75%

14.25

> IQR(data)

[1] 14.25