The standard deviation is used to measure the spread of values in a sample.
We can use the following formula to calculate the standard deviation of a given sample:
- Σ: A symbol that means “sum”
- xi: The ith value in the sample
- xbar: The mean of the sample
- n: The sample size
The higher the value for the standard deviation, the more spread out the values are in a sample. Conversely, the lower the value for the standard deviation, the more tightly packed together the values.
One question students often have is: What is considered a good value for the standard deviation?
The answer: A standard deviation can’t be “good” or “bad” because it simply tells us how spread out the values are in a sample.
There’s also no universal number that determines whether or not a standard deviation is “high” or “low.” For example, consider the following scenarios:
Scenario 1: A realtor collects data on the price of 100 houses in her city and finds that the standard deviation of prices is $12,000.
Scenario 2: An economist measures the total income tax collected in all 50 states in the U.S. and finds that the standard deviation of total income tax collected is $480,000.
Although the standard deviation in scenario 2 is much higher than the standard deviation in scenario 1, the units being measured in scenario 2 are much higher since the total taxes collected by states are obviously much higher than house prices.
This means there’s no single number we can use to tell whether or not a standard deviation is “good” or “bad” or even “high” or “low” because it depends on the situation.
Using the Coefficient of Variation
One way to determine if a standard deviation is high is to compare it to the mean of the dataset.
A coefficient of variation, often abbreviated as CV, is a way to measure how spread out values are in a dataset relative to the mean. It is calculated as:
CV = s /
- s: The standard deviation of dataset
- : The mean of dataset
In simple terms, the CV is the ratio between the standard deviation and the mean.
The higher the CV, the higher the standard deviation relative to the mean. In general, a CV value greater than 1 is often considered high.
For example, suppose a realtor collects data on the price of 100 houses in her city and finds that the mean price is $150,000 and the standard deviation of prices is $12,000. The CV would be calculated as:
- CV: $12,000 / $150,000 = .08
Since this CV value is well below 1, this tells us that the standard deviation of the data is quite low.
Conversely, suppose an economist measures the total income tax collected in all 50 states in the U.S. and finds that the sample mean is $400,000 and the standard deviation is $480,000. The CV would be calculated as:
- CV: $480,000 / $400,000 = 1.2
Since this CV value is greater than 1, it tells us that the standard deviation of the data values are quite high.
Comparing Standard Deviations Across Datasets
Often we use the standard deviation to measure the spread of values between different datasets.
For example, suppose a professor administers three exams to his students during the course of one semester. He then calculates the sample standard deviation of scores for each exam:
- Sample standard deviation of Exam 1 Scores: 4.6
- Sample standard deviation of Exam 2 Scores: 12.4
- Sample standard deviation of Exam 3 Scores: 2.3
This tells the professor that the exam scores were most spread out for Exam 2 while the scores were most tightly packed together for Exam 3.