# How to Calculate Sample & Population Variance in Python

The variance is a way to measure the spread of values in a dataset.

The formula to calculate population variance is:

σ2 = Σ (xi – μ)2 / N

where:

• Σ: A symbol that means “sum”
• μ: Population mean
• xi: The ith element from the population
• N: Population size

The formula to calculate sample variance is:

s2 = Σ (xix)2 / (n-1)

where:

• x: Sample mean
• xi: The ith element from the sample
• n: Sample size

We can use the variance and pvariance functions from the statistics library in Python to quickly calculate the sample variance and population variance (respectively) for a given array.

```from statistics import variance, pvariance

#calculate sample variance
variance(x)

#calculate population variance
pvariance(x)
```

The following examples show how to use each function in practice.

### Example 1: Calculating Sample Variance in Python

The following code shows how to calculate the sample variance of an array in Python:

```from statistics import variance

#define data
data = [4, 8, 12, 15, 9, 6, 14, 18, 12, 9, 16, 17, 17, 20, 14]

#calculate sample variance
variance(data)

22.067
```

The sample variance turns out to be 22.067.

### Example 2: Calculating Population Variance in Python

The following code shows how to calculate the population variance of an array in Python:

```from statistics import pvariance

#define data
data = [4, 8, 12, 15, 9, 6, 14, 18, 12, 9, 16, 17, 17, 20, 14]

#calculate sample variance
pvariance(data)

20.596```

The population variance turns out to be 20.596.

### Notes on Calculating Sample & Population Variance

Keep in mind the following when calculating the sample and population variance:

• You should calculate the population variance when the dataset you’re working with represents an entire population, i.e. every value that you’re interested in.
• You should calculate the sample variance when the dataset you’re working with represents a a sample taken from a larger population of interest.
• The sample variance of a given array of data will always be larger than the population variance for the same array of a data because there is more uncertainty when calculating the sample variance, thus our estimate of the variance will be larger.