In statistics, we’re often interested in studying characteristics of specific populations. For example, we might be interested in studying:

- The overall job satisfaction of mechanical engineers in a certain city.
- Political preferences of individuals in a certain county.
- The age distribution of individuals in a certain country.
- Movie preferences of students in a certain school.

In each of these examples, we want to gain an understanding of a certain **population**.

Population:The entire group of individuals you are interested in studying.

Unfortunately, it can be expensive and time-consuming to gather data for every individual in a population, which is why researchers typically gather data for a **sample **from a population and then generalize the findings from the sample to the larger population.

Sample:A subset of the population.

For example, suppose we want to understand the movie preferences of students in a certain school that has 1,000 total students. Since it would take too long to survey every individual student, we might instead take a random sample of 100 students and ask them about their preferences.

The 1,000 students represent the population, while the 100 randomly selected students represent the sample. Once we collect data for the sample of 100 students, we can then generalize those findings to the overall population of 1,000 students, *but only if our sample is representative of our population*.

Representative sample:A sample in which the characteristics of the individuals closely match the characteristics of the overall population.

Ideally, we want our sample to be like a “mini version” of our population. So, if the overall student population is composed of 50% girls and 50% boys, our sample would not be representative if it included 90% boys and only 10% girls.

Or, if the overall population is composed of equal parts freshman, sophomores, juniors, and seniors, then our sample would not be representative if it only included freshman.

**The Importance of Obtaining a Representative Sample**

The reason we want a representative sample is so that we can confidently generalize the findings from the sample to the population.

For example, suppose we want to know what percentage of students at a certain school prefer “drama” as their favorite movie genre. If the total student population is a mix of 50% boys and 50% girls, then a sample with a mix of 90% boys and 10% girls might lead to biased results if far fewer boys prefer drama as their favorite genre.

Or, if the total population is a mix of equal parts freshman, sophomores, juniors, and seniors, then a sample with only freshman might lead to biased results as well if younger students (e.g. freshman) tend to prefer drama at much higher rates than older students.

If the characteristics of individuals in our sample do not closely match the characteristics of the individuals in the overall population, then we cannot generalize the findings from the sample to the overall population with any confidence.

**How to Obtain a Representative Sample**

To maximize the chances that we obtain a representative sample, we need to focus on two things when obtaining our sample:

**1. Use an appropriate sampling method.**

There are many ways to obtain a sample from a population, but here are three methods that are likely to obtain a representative sample:

**Simple random sample: **Randomly select individuals through the use of a random number generator or some means of random selection.

**Example:**Assign a number to all 1,000 students. Then, use a random number generator to select 100 random numbers and use the corresponding students as members in the sample.**Benefit:**Simple random samples are usually representative of the population we’re interested in since every member has an equal chance of being included in the sample.

**Systematic random sample: **Put every member of a population into some order. Choose a random starting point and select every n^{th} member to be in the sample.

**Example:**Create a list in alphabetical order based on the last name of all 1,000 students, randomly choose a starting point, and pick every 10th student to be in the sample.**Benefit:**Systematic random samples are usually representative of the population we’re interested in since every member has an equal chance of being included in the sample.

**Stratified random sample:** Split a population into groups. Randomly select some members from each group to be in the sample.

**Example:**Split up all students according to their grade – freshman, sophomores, juniors, and seniors. Randomly select 25 students from each grade to be in the sample.**Benefit:**Stratified random samples ensure that an equal number of students from each grade are included in the sample.

**2. Make sure the sample is large enough.**

Along with using an appropriate sampling method, it’s important to ensure that the sample is large enough so that we have enough data to generalize to the larger population.

For example, a sample of eight students – a boy and a girl from each grade – might represent a mini version of the larger population, but it’s probably not large enough to capture all of the variability that naturally exists in the responses of the students.

So, how large does your sample need to be?

That depends on the following factors:

**Population size:**In general, the larger the population size, the larger the sample needs to be. For example, you’ll need a much larger sample if you want to generalize your findings to an entire country compared to a single city.**Confidence level:**How confident you want to be that the true population value you’re interested in falls within your confidence interval. Common confidence levels include 90%, 95%, and 99%. The higher the confidence level, the larger your sample needs to be.**Margin of error:**How much error you’re willing to tolerate. No sample will be perfect, so you must be willing to accept at least some amount of error. Most research studies will report their findings with a margin of error, for example “40% of students reported that*drama*was their favorite movie genre, with a margin of error of +/- 5%.” The lower the margin of error, the smaller your sample needs to be.

There are plenty of sample size calculators online to help you determine how large your sample needs to be based on these factors. This calculator from Survey Monkey is particularly easy to use.

**Things to Keep in Mind**

Even if you use an appropriate sampling method and ensure that your sample is large enough, keep in mind the following things:

- There will always be
*some*sampling error. The sample will never be perfectly representative of the larger population. - In general, the larger the sample, the more likely it will be representative of the population.
- You need to strike a balance between sample size and real-world variables like time and cost. A larger sample might have a higher chance of representing the overall population, but it might be more expensive and time-consuming to obtain.