This tutorial provides a simple explanation of lurking variables along with several examples.
What is a Lurking Variable?
A lurking variable is a variable that is not included in a statistical analysis, yet impacts the relationship between two variables within the analysis.
A lurking variable can hide the true relationship between variables or it can falsely cause a relationship to appear to be present between variables. Essentially, lurking variables can cause the results of a study to be misleading.
In observational studies, it’s important to be aware of the fact that lurking variables could cause unusual interpretations of data and the relationships between variables. In experimental studies, it’s important to design the experiment in such a way that (as much as possible) eliminates the risk of lurking variables.
Examples of Lurking Variables
The following examples illustrate several cases in which lurking variables could be present in a study:
A researcher finds that ice cream sales and shark attacks are highly positively correlated. Does this mean that increased ice cream sales is causing more shark attacks?
That’s unlikely. The more likely cause is the lurking variable weather. When it is warmer outside, more people buy ice cream and more people go in the ocean.
A researcher finds that popcorn consumption and the amount of traffic accidents over the years is highly correlated. Does this mean that higher popcorn consumption is causing more traffic accidents?
That’s unlikely. The more likely cause is the lurking variable population. As the population increases, both the amount of popcorn consumed and the amount of traffic accidents increases.
A study finds that the more volunteers that show up after a natural disaster, the greater the damage. Does this mean that volunteers are causing more damage to occur?
That’s unlikely. The more likely cause is the lurking variable size of the natural disaster. A larger natural disaster causes more volunteers to show up as well as an increase in the amount of damage done by the natural disaster.
A study finds that glove sales and snowboarding accidents are highly correlated. Does this mean that gloves are causing more snowboard accidents to occur?
That’s unlikely. The more likely cause is the lurking variable temperature. As temperature decreases, more people buy gloves and more people go snowboarding.
How to Identify Lurking Variables
To discover lurking variables, it helps to have domain expertise in the area under study. By knowing what potential variables could be affecting the relationship between the variables in the study that aren’t included explicitly in the study, you may be able to uncover potential lurking variables.
Another way to identify potential lurking variables is through examining residual plots. If there is a trend (either linear or non-linear) in the residuals, this could mean that a lurking variable not included in the study is impacting the variables within the study in some way.
How to Eliminate the Risk of Lurking Variables
In observational studies, it can be very difficult to eliminate the risk of lurking variables. In most cases, the best you can do is simply identify, rather then prevent, potential lurking variables that may be impacting the study.
In experimental studies, however, the impact of lurking variables can mostly be eliminated with good experimental design.
For example, suppose we want to know whether two pills have a different impact on blood pressure. We know that lurking variables such as diet and smoking habits also impact blood pressure, so we can attempt to control for these lurking variables by using a randomized design. This means we randomly assign patients to take either the first or second pill.
Since we randomly assign patients to groups, we can assume that the lurking variables will affect both groups roughly equally. This means any differences in blood pressure can be attributed to the pill, rather than the effect of a lurking variable.