# 6 Puzzling Statistical Paradoxes Explained

In the vast world of statistics, paradoxes often emerge to challenge our intuition and reveal subtle complexities that can be present in data. In this article, we will delve into seven puzzling statistical paradoxes to deepen understanding of statistical phenomena and their broader implications.

## 1. The Gambler’s Fallacy

The gambler’s fallacy revolves around the misconception that past outcomes influence future probabilities, even if events are independent. The fallacy suggest that if a particular event has occurred repeatedly in the past, it is less likely to occur in the future.

For example, in a simple game of rolling a fair dice, if a six has just been rolled three times in a row, it would be the gambler’s fallacy to presume that the next roll has a lower chance of also being a six. In reality, each event is statistically independent and should be treated as such.

Understanding this paradox can guard against the allure of the fallacy and help you make more rational choices based on accurate assessments of probability and event independence.

A favorite of introductory statistics courses, the birthday paradox answers the question: how many people must be in a room for it to be more likely than not that two of them have the same birthday (excluding the year of birth)?

At first glance, it may seem like this number needs to be in the hundreds. However, the true answer is that with just 23 people, there is a greater than 50% chance of a matching birthday. This paradox comes from the counterintuitive way that probabilities accumulate when considering the number of comparisons made between individuals in a group.

This paradox has practical applications in cryptography and data analysis. Understanding how it works can help you make better informed decisions about chance and coincidences.

The friendship paradox states that, on average, your friends have more friends than you do. This paradox arises from the structure of social networks and the way that we perceive our own connections.

Consider a social network where some individuals have many friends while others have few. Those with numerous friendships are more likely to be counted multiple times when individuals are asked about their friends. Therefore, these popular individuals skew the average number of friends in the overall network upward, creating a network where the average person has fewer friends than their friends do.

This paradox comes into play when predicting the spread of information or trends within a network, as well as understanding the complexities of connections.

An important decision that must be made when summarizing and presenting data is how to group individuals together. Some variables can be presented either continuously or categorically, such as age being represented as-is or as age group bins. If bins are selected, it is also important to carefully define different groups.

Simpson’s paradox highlights the importance of considering data presentation and occurs when trends observed in grouped data disappear or reverse when looking at the data in aggregate. A classic example involves admission rates to graduate school programs. When looking at admission rates per department, it may appear that the school is biased against a certain gender. However, when considering the population of the school as a whole, the bias disappears. This reversal occurs when one demographic sends more applications to the competitive departments in the school.

This paradox can be avoided by carefully considering how data is aggregated and determining whether those aggregations are valid and speak to the true nature of the data.

Correlations between two variables are an important finding and inform many business decisions. However, it is vital to keep the Berkson’s paradox in mind, which arises when a correlation between two variables disappears when changing the participant selection process.

This occurs when one variable is dependent on the other. For example, a study may report that standardized test scores in English and mathematics are correlated. However, examination of the data may reveal that only Advanced Placement exams were included and students who opt into these exams are more likely to score higher overall. In the broader context of all students, there may no longer be a correlation between English and math scores, or the correlation may reverse.

The Berkson’s paradox highlights the importance of understanding potential biases related to who is included or excluded in a given study population.

## 6. Will Rogers Phenomenon

The Will Rogers phenomenon describes a statistical effect that occurs when moving an observation from one group to another increases the mean of a variable for both groups. This occurs in situations where there are shifting boundaries or reclassifications of categories.

Consider a sports league with two divisions. Due to reevaluation of performance, a few teams in the higher division are moved into the lower division. This can improve the average performance within both divisions because the teams that were transferred may have been struggling against the stronger competition in the higher division, but also raise the overall level of competition in the lower division.

This phenomena highlights the dynamic nature of data and the complexities involved in data categorization. Being mindful of this impact can improve robustness and reliability of your statistical conclusions.

## Conclusion

The study of statistical paradoxes can provide valuable insights into the complexities of data analysis and interpretation. Each paradox challenges intuition and underscores the importance of statistical reasoning and providing validation for each decision made in the analysis process.