Sample Variance vs. Population Variance: What’s the Difference?


The variance is a way to measure the spread of values in a dataset.

The formula to calculate population variance is:

σ2 = Σ (xi – μ)2 / N

where:

  • Σ: A symbol that means “sum”
  • μ: Population mean
  • xi: The ith element from the population
  • N: Population size

The formula to calculate sample variance is:

s2 = Σ (xix)2 / (n-1)

where:

  • x: Sample mean
  • xi: The ith element from the sample
  • n: Sample size

Notice that there’s only one tiny difference between the two formulas:

When we calculate population variance, we divide by N (the population size).

When we calculate sample variance, we divide by n-1 (the sample size – 1).

When calculating the sample variance, we apply something known as Bessel’s correction – which is the act of dividing by n-1.

Without getting bogged down in the mathematical details, dividing by n-1 can be shown to provide an unbiased estimate of the population variance, which is the value we’re usually interested in anyway.

When to Calculate Sample Variance vs. Population Variance

If you’re unsure of whether you should calculate the sample variance or the population variance, keep this rule of thumb in mind:

You should calculate the sample variance when the dataset you’re working with represents a a sample taken from a larger population of interest.

You should calculate the population variance when the dataset you’re working with represents an entire population, i.e. every value that you’re interested in.

The following examples show different scenarios of when to calculate the sample variance vs. the population variance.

Example: Calculating Sample Variance

Suppose a botanist wants to calculate the variance in height of a certain species of plants. Because there are thousands of individual plants in one region, she decides to take a simple random sample of 20 plants and measure each of their heights.

In this scenario, the botanist should calculate the sample variance because she is interested in the variance of the entire population of plants but is simply using this sample to estimate the true population variance.

Example: Calculating Population Variance

Suppose a teacher wants to calculate the variance of exam scores for the 20 students in her class.

In this scenario, the teacher should calculate the population variance because the dataset she’s working with (the 20 exam scores) represent the entire population that she is interested in.

Additional Resources

The following tutorials explain how to calculate sample variance and population variance in different statistical software:

How to Calculate Sample & Population Variance in Excel
How to Calculate Sample & Population Variance in R
How to Calculate Sample & Population Variance in Python

Leave a Reply

Your email address will not be published.