The term face validity refers to the extent to which a test appears to measure what it claims to measure based on face value.
For example, a researcher may create a questionnaire that aims to measure depression levels in individuals. A colleague may then look over the questions and deem the questionnaire to be valid purely on face value.
In other words, on its surface the questionnaire seems to be constructed in such a way that it’s a good tool to use to measure depression levels.
Face validity is the most informal and subjective way to measure the validity of a test.
How to Measure Face Validity
In practice, we often measure face validity by asking multiple people to rate the validity of a test using a Likert scale.
For example, the potential responses could be:
1. The test is completely appropriate for measuring a certain construct.
2. The test is mostly appropriate.
3. The test is somewhat appropriate.
4. The test is neither appropriate nor inappropriate.
5. The test is somewhat inappropriate.
6. The test is mostly inappropriate.
7. The test is completely inappropriate.
There are three potential groups of people who could provide ratings for the face validity of a test:
1. People who take the test.
Individuals who actually take the test could provide ratings on face validity.
2. People who work with the test in some way.
Employers, university staff, coaching staff, or other individuals who work with the test in some way could provide ratings on face validity.
3. Members of the general public who are interested in the test.
Parents, teachers, school board members, city council members, etc. who are all interested in the test could provide ratings on face validity.
A test is considered to have high face validity if there is a high level of agreement among raters.
For example, if most raters say that the test or questionnaire is highly appropriate for measuring a certain construct then we would say that the test has high face validity.
Why Use Face Validity?
Face validity is a highly informal way to measure validity, but it can be useful for quickly ruling out sub-par research practices and techniques.
For example, if a questionnaire that aims to measure depression included questions such as:
- “What is your favorite color?”
- “What is your political party affiliation?”
Then we could quickly say that the questionnaire does not have face validity and likely doesn’t do a good job of measuring depression levels since the questions are irrelevant.
Thus, face validity offers a quick way to provide feedback on a test, questionnaire, or exam that doesn’t appear to measure the thing that it sets out to measure.
If a test does have face validity, we would likely go on to verify that it has more rigorous forms of validity like criterion validity, content validity, etc.
Other Types of Validity
The following tutorials provide brief explanations of other types of validity measurements: