# Benford’s Law: Explanation & Examples

In statistics, Benford’s Law describes the frequency distribution of leading digits of numbers in a dataset.

This law explains that in many real-life datasets, the leading digit is more likely to be small than large.

In fact, it has been found that the probability of the leading digit for a given number in a dataset being between 1 and 9 is as follows:

• 1: 30.1%
• 2: 17.6%
• 3: 12.5%
• 4: 9.7%
• 5: 7.9%
• 6: 6.7%
• 7: 5.8%
• 8: 5.1%
• 9: 4.6%

This phenomenon has been observed in a wide variety of fields including house prices, stock prices, street addresses, death rates, and many others.

Note: Benford’s Law is also sometimes referred to as the Law of Anomalous Numbers.

It turns out that this law has several real-life applications, the most common being fraud-detection.

If it is known that the numbers in a dataset should follow Benford’s Law, then it can raise a red flag for spam or fraud if it is seen that larger numbers tend to occur more frequently than they should.

This can be useful when banks or other institutions are attempting to identify fraudulent account records or transactions.

## When Does Benford’s Law Apply to Datasets?

Benford’s Law doesn’t apply to all datasets, but in general it does apply to any dataset that meets the following requirements:

• There is no artificial minimum or maximum value on a dataset.
• The dataset ranges over an order of magnitudes.
• The values in the dataset are measured rather than assigned or bucketed.
• The dataset only includes quantitative data.

If each of these conditions are met, then we would expect Benford’s Law to be able to describe the relative frequency of leading digits in the dataset.

Some examples of when a real dataset would not meet these requirements include:

• A dataset that describes the height of individuals (has a minimum and maximum height)
• A dataset that describes IQ values (does not cover an order of magnitude)
• A dataset that describes people’s movie ratings (values are assigned or bucketed)
• A dataset that describes political preferences (values are not quantiative)

In each of these examples, we would not be able to use Benford’s Law to describe the relative frequency distribution of leading digits in the values in the dataset.

## Example: Using Benford’s Law in the Real World

Benford’s Law has been used to detect fraud related to socio-economic data in the real world.

For example, it’s known that population sizes of cities and towns tend to follow Benford’s Law.

Suppose a government official received a report that showed the following distribution of leading digits in the census of various cities in his state:

• 1: 10%
• 2: 15%
• 3: 12%
• 4: 8%
• 5: 9%
• 6: 10%
• 7: 11%
• 8: 10%
• 9: 15%

Notice that each leading digit occurs at roughly the same rate.

This would raise a potential red flag that the data could be fraudulent because scammers tend to use a uniform distribution when generating fake numbers.

If the data were accurate, we would expect the smaller digits to occur much more frequently, just as Benford’s Law would predict.