# 5 Statistical Biases to Avoid

Image created by Author

Biases in statistics are systematic errors in the performance of research or data collection and analysis that can threaten the validity of findings and the efficacy of decisions. These biases are not only present in academic research, but show up across all applications of quantitative analysis in fields as varied as economics, engineering, social sciences, and health research. Avoiding and mitigating statistical biases is an essential skill for any competent data scientist, statistician, or hobbyist — or anyone at all, really — as failure to do so may well undermine the integrity of these analyses and their usefulness in making inferences or proposals.

The ability to spot and manage biases in quantitative roles is an important step toward making work more trustworthy and reliable. To that end, in this post we will go over 5 examples of statistical bias. For each, we will discuss the nature of the bias, provide a real real-world example, and note how this bias can trouble the research process or decision-making process. We will also detail how to attenuate this bias in practice, so as to attain more objective and valid data.

## 1. Confirmation Bias

Confirmation bias is the tendency to pay more attention to information that cements one’s preconceptions. This bias can manifest when an investigator might give greater weight to data that supports their hypothesis, while discounting data that contradicts or disappoints. Scientists are not alone in this respect: data analysts use inferior statistical tests in the hopes of gaining the desired outcome. The consequences of this bias are highly negative, as it can lead to research results that are less than falsifiable and this might result in poor policy-setting. A good idea to fight confirmation bias is actively and wholey seek-ing out information that leads one to question one’s existing beliefs. Collaboration with others and peer review are also a good way to hold confirmation bias at bay.

## 2. Sampling Bias

Sampling bias is a problem where the sample used in a study does not accurately represent the population. For example, an online survey might only get responses from younger people, missing an older demographic, or a health study could be run by pulling patients only from a single medical center, rather than using individuals from across a variety of socio-economic backgrounds. A non-representative sample can lead to findings which cannot be applied to the population at large, which could in turn skew policy and business decisions made based on that data. Utilize random sampling techniques, and think about using stratification to make sure that all population segments are similarly represented.

## 3. Survivorship Bias

Survivorship bias refers to concentrating only on those things which “survive” a process, and neglecting all of those which do not. In finance, this can involve only examining companies which exist now, instead of taking into account those that may have gone bankrupt using the same investment strategies. This can also happen with military history, by glorifying only those strategies which succeeded and not paying attention to failed operations. Such bias results in unrealistic optimism, since failures are omitted. Be sure to include failures in any analysis. Get a complete dataset which encompasses all results.

## 4. Anchoring Bias

Anchoring bias is a cognitive bias where individuals rely too heavily on an initial piece of information, the ‘anchor’, when making decisions. In negotiation, the first price you hear can have a profound effect. Consider your anchor when planning ahead. It can lead to poor decisions when this anchor is incorrect or unrepresentative of reality. Seeking out extra information to challenge an anchor is always suggested. Structured decision-making techniques and processes can be useful for counteracting this type of bias.

## 5. Publication Bias

Publication bias occurs when the outcomes of research influence the likelihood of publication, in particular favoring results that are statistically significant. This comes from the assumed preference for academic journals to publish studies with significant findings, which then leads researchers to discard data that doesn’t meet these criteria. This bias then skews the scientific literature and, in turn, can misinform public policy, scientific consensus, and even public perception. Promoting the registration of studies and acceptance of all results, regardless of their significance, is a strategy to help mitigate publication bias. Journals that publish negative results and replicate studies also contribute to a more balanced scientific dialogue.

## Summary

Whether you are a data scientist or statistician, a practicing professional or a student, being able to understand and control for the five biases we have just discussed is significant, even essential. Scientists who generate biased information and data based on these are setting themselves up for a hard crash during the replication or testing phase. New scientific methods and strategies are critical in facing the psychological biases mentioned.

May 13, 2024
April 25, 2024
April 19, 2024
April 18, 2024
April 18, 2024