Pseudoreplication is a common problem that occurs in statistical studies. In this article we share the following:
- A simple explanation of pseudoreplication
- Examples of pseudoreplication
- How to avoid pseudoreplication
Let’s jump in!
What is Pseudoreplication?
Suppose we want to know whether medication A or medication B is better at lowering blood pressure.
If we simply give medication A to one person and medication B to a different person, and measure their corresponding drop in blood pressure, we won’t have nearly enough information from the study to generalize the results to people other than these two.
In addition, each individual person responds differently to medications, so the differences that we see in blood pressure may be due to random chance.
However, if we give medication A to 30 people and give medication B to another 30 people, then we can obtain far more information about the variability of each medication. This will allow us to use statistical inference to find out if one of the medications is more effective at reducing blood pressure than the other on average.
This simple example highlights an important component of statistical inference: when we are interested in testing for differences between treatments, we need replication. A common mistake often made in studies is the use of pseudoreplication.
Replication refers to having more than one experiment unit with the same treatment. Each unit with the same treatment is known as a replicate. For example, if we only give each medication to one person, we don’t have replication. But if we give each medication to 30 different people, then we have 30 replicates.
Pseudoreplication refers to the case when treatments are not replicated or when the replicates are not statistically independent.
Let’s walk through three examples of pseudoreplication to gain a better understanding of it.
Scenario: Suppose researchers want to know whether program A or program B is more effective at helping high school basketball players jump higher. One high school team at a local school is randomly chosen to implement program A for one month while another high school team at another local school is randomly chosen to implement program B for one month. After one month, the players on each team are tested to measure their max jump.
Problem: There is no true replication in this study. In this example, the schools are the experimental units that are randomly assigned a treatment. Although several players from each school participate in the program, the players are pseudo-replicates because the results for each player are not independent; they’re influenced by the coaches, the practice styles, and other team-specific factors.
Remedy: We need several replicates of the experimental units. In this case, since the experimental units are the teams, we need to assign more than one high school team to each program. For example, assign five local high school teams to use program A and five different teams to use program B. This would give us information about the variability of the effect of the different programs.
Scenario: Suppose researchers want to know whether medication A or medication B is more effective at lowering blood pressure. Medication A is given to one person and medication B is give to another person. Each person’s blood pressure is measured once per week for ten weeks.
Problem: There is no true replication in this study. In this example, multiple measurements are made on the same person, which is an example of a repeated measure, not a replication.
Remedy: We need several replicates of the experimental units. In this case, since the experimental units are the individuals, we need to assign more than one individual to use each medication. For example, assign 30 individuals to use medication A and 30 individuals to use medication B. This would give us information about the variability of the effect of the different medications.
Scenario: Suppose researchers want to know whether formula A or formula B is more effective at making plants grow taller. Formula A is sprinkled on 100 plants in one field and formula B is sprinkled on 100 plants in a different field several miles down the road.
Problem: There is no true replication in this study. In this example, the fields are the experimental units that are randomly assigned a treatment. Although several plants from each field are sprinkled with the formula, the plants are pseudo-replicates because the results for each plant are not independent; the 100 plants in the each field share whatever conditions are present in that field (soil conditions, natural predators, weather conditions, amount of rainfall in one field vs the other, etc.).
Remedy: We need several replicates of the experimental units. In this case, since the experimental units are the plants, we need to sprinkle formula A on 100 plants and formula B on 100 plants that are all located in the field. This ensures that the conditions of the field won’t skew the results of one treatment more than the other, since the plants that receive formula A and the plants that receive formula B are all located in the same field.
Effects of Pseudoreplication
When pseudoreplication occurs in a study, variability is likely to be underestimated. This has two effects:
- Confidence intervals are likely to be too narrow (and untrustworthy)
- The Type I error (probability of falsely rejecting a true null hypothesis) will be larger.
Both of these are undesirable effects in a study and make statistical inference unreliable. This means it will be difficult to generalize the findings from the study to a larger population.
What to Do About It
If possible, be sure to avoid pseudoreplication when designing an experiment. Note that for observational studies, pseudoreplication is common and often can’t be avoided.
If it’s not possible to avoid pseudoreplication, then consider the study to be a preliminary study that can be used to design a better study in the future. In addition, be open about the limitations of the study when reporting any findings to a larger audience.