In statistics, we use data to answer interesting questions. But not all data is created equal. There are actually four different **data measurement scales** that are used to categorize different types of data:

**1.** Nominal

**2.** Ordinal

**3.** Interval

**4.** Ratio

In this post, we define each measurement scale and provide examples of variables that can be used with each scale.

**Nominal**

The simplest measurement scale we can use to label variables is a **nominal scale**.

Nominal scale:A scale used to label variables that have no quantitative values.

Some examples of variables that can be measured on a nominal scale include:

**Gender:**Male, female**Eye color:**Blue, green, brown**Hair color:**Blonde, black, brown, grey, other**Blood type:**O-, O+, A-, A+, B-, B+, AB-, AB+**Political Preference:**Republican, Democrat, Independent**Place you live:**City, suburbs, rural

Variables that can be measured on a nominal scale have the following properties:

**They have no natural order.**For example, we can’t arrange eye colors in order of worst to best or lowest to highest.**Categories are mutually exclusive.**For example, an individual can’t have*both*blue and brown eyes. Similarly, an individual can’t live*both*in the city and in a rural area.**The only number we can calculate for these variables are**For example, we can count how many individuals have blonde hair, how many have black hair, how many have brown hair, etc.*counts*.**The only measure of central tendency we can calculate for these variables is**The mode tells us which category had the most counts. For example, we could find which eye color occurred most frequently.*the mode*.

The most common way that nominal scale data is collected is through a survey. For example, a researcher might survey 100 people and ask each of them what type of place they live in.

**Question:** What type of area do you live in?

**Possible Answers:** City, Suburbs, Rural.

Using this data, the researcher can find out how many people live in each area, as well as which area is the most common to live in.

**Ordinal**

The next type of measurement scale that we can use to label variables is an **ordinal ****scale**.

Ordinal scale:A scale used to label variables that have a naturalorder, but no quantifiable difference between values.

Some examples of variables that can be measured on an ordinal scale include:

**Satisfaction:**Very unsatisfied, unsatisfied, neutral, satisfied, very satisfied**Socioeconomic status:**Low income, medium income, high income**Workplace status:**Entry Analyst, Analyst I, Analyst II, Lead Analyst**Degree of pain:**Small amount of pain, medium amount of pain, high amount of pain

Variables that can be measured on an ordinal scale have the following properties:

**They have a natural order.**For example, “very satisfied” is better than “satisfied,” which is better than “neutral,” etc.**The difference between values can’t be evaluated.**For example, we can’t exactly say that the difference between “very satisfied and “satisfied” is the same as the difference between “satisfied” and “neutral.”**The two measures of central tendency we can calculate for these variables are**The mode tells us which category had the most counts and the median tells us the “middle” value.*the mode*and*the median*.

Ordinal scale data is often collected by companies through surveys who are looking for feedback about their product or service. For example, a grocery store might survey 100 recent customers and ask them about their overall experience.

**Question:** How satisfied were you with your most recent visit to our store?

**Possible Answers:** Very unsatisfied, unsatisfied, neutral, satisfied, very satisfied.

Using this data, the grocery store can analyze the total number of responses for each category, identify which response was most common, and identify the median response.

**Interval**

The next type of measurement scale that we can use to label variables is an **interval ****scale**.

Interval scale:A scale used to label variables that have a natural order and a quantifiable difference between values,but no “true zero” value.

Some examples of variables that can be measured on an interval scale include:

**Temperature:**Measured in Fahrenheit or Celsius**Credit Scores:**Measured from 300 to 850**SAT Scores:**Measured from 400 to 1,600

Variables that can be measured on an interval scale have the following properties:

**These variables have a natural order.****We can measure the mean, median, mode, and standard deviation of these variables.****These variables have an exact difference between values.**Recall that ordinal variables have no exact difference between variables – we don’t know if the difference between “very satisfied” and “satisfied” is the same as the difference between “satisfied” and “neutral.” For variables on an interval scale, though, we know that the difference between a credit score of 850 and 800 is the exact same as the difference between 800 and 750.**These variables have no “true zero” value.**For example, it’s impossible to have a credit score of zero. It’s also impossible to have an SAT score of zero. And for temperatures, it’s possible to have negative values (e.g. -10° F) which means there isn’t a true zero value that values can’t go below.

The nice thing about interval scale data is that it can be analyzed in more ways than nominal or ordinal data. For example, researchers could gather data on the credit scores of residents in a certain county and calculate the following metrics:

- Median credit score (the “middle” credit score value)
- Mean credit score (the average credit score)
- Mode credit score (the credit score that occurs most often)
- Standard deviation of credit scores (a way to measure how spread out credit scores are)

**Ratio**

The last type of measurement scale that we can use to label variables is a **ratio ****scale**.

Ratio scale:A scale used to label variables that have a natural order, a quantifiable difference between values, and a “true zero” value.

Some examples of variables that can be measured on a ratio scale include:

**Height:**Can be measured in centimeters, inches, feet, etc. and cannot have a value below zero.**Weight:**Can be measured in kilograms, pounds, etc. and cannot have a value below zero.**Length:**Can be measured in centimeters, inches, feet, etc. and cannot have a value below zero.

Variables that can be measured on a ratio scale have the following properties:

**These variables have a natural order.****We can calculate the mean, median, mode, standard deviation, and a variety of other descriptive statistics for these variables.****These variables have an exact difference between values.****These variables have a “true zero” value.**For example, length, weight, and height all have a minimum value (zero) that can’t be exceeded. It’s not possible for ratio variables to take on negative values. For this reason, the*ratio*between values can be calculated. For example, someone who weighs 200 lbs. can be said to weigh*two times*as much as someone who weights 100 lbs. Likewise someone who is 6 feet tall is*1.5 times*taller than someone who is 4 feet tall.

Data that can be measured on a ratio scale can be analyzed in a variety of ways. For example, researchers could gather data about the height of individuals in a certain school and calculate the following metrics:

- Median height
- Mean height
- Mode height
- Standard deviation of heights
- Ratio of tallest height to smallest height

**Summary**

The following table provides a summary of the variables in each measurement scale:

Property |
Nominal |
Ordinal |
Interval |
Ratio |
---|---|---|---|---|

Has a natural “order” |
NO | YES | YES | YES |

Mode can be calculated |
YES | YES | YES | YES |

Median can be calculated |
YES | YES | YES | |

Mean can be calculated |
YES | YES | ||

Exact difference between values |
YES | YES | ||

Has a “true zero” value |
YES |

The summary table at the bottom, the Nominal value does not have natural order. Might be a typo

There seems to be a typo in the summary table. Nominal has no natural order.

There’s a discrepancy with the summary table and your post, i.e. Nominal data “have no natural order”

that is a great post

the clarity inspires me to incorporate some ideas into my slides for students 🙂

PS. the last table contains one mistake – which is obvious if the whole post is read, i.e. the natural order should not have a plus under nominal measurement (I believe the first line was intended to be “separate categories”)

Summary, Nominal, Has a natural “order” should not be YES

There’s a mistake in the table in the end: nominal variables do not have a “natural” order, so it should be a NO.

And I have to point out that temperature *does* have a true zero (it’s around −273 °C / −460 °F), though it’s true it doesn’t matter much inmost people’s daily life.

Hi Zach,

First of all thanks for all these information. Just want to add here that the table at the end, the property, “has natural order” for nominal measure should be “NO”, isn’t it ?

Hi! Thank you for these great and interesting summaries ! Now I can see a big connected picture of the statistics and found answers to all the questions. Short and deep. It seems that in “Nominal” the order is assumed to be “NO”?

Hi there, I think there is a minor typo in the last table. Under Nominal, shouldn’t ‘Has a natural “order”’ be a No instead? 🙂

In summary table, you have mentioned that Nominal has a natural order. Can you please review if that is correct?

Dear Zach ,

The blog was very useful and I loved reading. But in the summary for nominal data , the natural order is given as yes which is incorrect and kindly change that as NO.

Thank you.

Regards,

A.Hari babu

Good article. FYI Temperature does have a true 0 (-273C).

I think this means temperature in Kelvin is a ratio scale while temperature in Celsius or Fahrenheit are interval scales.

Hi Jason…Yes, the statement is true. Here’s an explanation of why temperature in Kelvin is a ratio scale while temperature in Celsius or Fahrenheit are interval scales:

### Interval Scale

An interval scale is a scale of measurement where the difference between any two values is meaningful. However, an interval scale does not have a true zero point, which means that ratios of values do not have meaningful interpretations.

– **Celsius and Fahrenheit**: These are examples of interval scales. In both scales, the difference between degrees is meaningful (e.g., the difference between 10°C and 20°C is the same as the difference between 30°C and 40°C). However, they do not have a true zero point. The zero point in Celsius (0°C) and Fahrenheit (0°F) are arbitrarily set and do not represent an absence of temperature. As a result, you cannot say that 20°C is twice as hot as 10°C.

### Ratio Scale

A ratio scale is a scale of measurement where the differences between values are meaningful, and there is a true zero point. This allows for the comparison of ratios.

– **Kelvin**: This is an example of a ratio scale. The Kelvin scale has a true zero point (0 K), which represents the complete absence of thermal energy. This makes ratios meaningful on the Kelvin scale. For example, 200 K is twice as hot as 100 K because the zero point represents an absolute absence of heat.

### Summary

– **Temperature in Celsius or Fahrenheit**: These are interval scales because they have meaningful differences between values but lack a true zero point.

– **Temperature in Kelvin**: This is a ratio scale because it has both meaningful differences between values and a true zero point.

TRUE ZERO means no existence at 0. Like wt=0 mean nothing while temperature=0 means a temperature exists , even below it -15degree is also some temperature. It is not true zero.

Hi Meenakshi…Your comment touches on an important concept in statistics and measurement theory regarding the nature of scales and the interpretation of zero values. Here’s a more detailed explanation:

### TRUE ZERO vs. Arbitrary Zero

In the context of measurement scales, the concept of “true zero” refers to a point where the value of zero represents the complete absence of the quantity being measured. This is in contrast to scales where zero is just an arbitrary reference point and does not indicate the absence of the quantity.

**True Zero (Absolute Zero)**:

– **Example**: Weight (wt = 0)

– When we say an object weighs 0 kilograms, it means the object has no weight at all. There is a complete absence of the property being measured, which is weight.

– True zero is essential for ratio scales, where it makes sense to say one quantity is twice as much as another (e.g., 10 kg is twice as heavy as 5 kg).

**Arbitrary Zero (Relative Zero)**:

– **Example**: Temperature in Celsius or Fahrenheit

– In Celsius, 0 degrees does not mean there is no temperature; it is just a point on the scale chosen based on the freezing point of water. Temperatures can go below zero (e.g., -15 degrees), indicating that zero is not the absence of temperature but rather a relative point on the scale.

– This is typical of interval scales, where differences between values are meaningful, but there is no true zero. You cannot say 20°C is twice as hot as 10°C because the zero point is arbitrary.

### Scale Types and Their Properties

1. **Nominal Scale**:

– Categories without any order (e.g., types of fruit: apple, orange, banana).

2. **Ordinal Scale**:

– Categories with a meaningful order but no consistent difference between them (e.g., rankings: 1st, 2nd, 3rd).

3. **Interval Scale**:

– Ordered categories with consistent differences between values, but no true zero (e.g., temperature in Celsius or Fahrenheit).

4. **Ratio Scale**:

– Ordered categories with consistent differences and a true zero point, allowing for the comparison of ratios (e.g., weight, height, distance).

### Practical Implications

Understanding the type of scale you’re dealing with is crucial for proper data analysis:

– **Statistical Operations**:

– Ratio scales allow for a full range of statistical operations, including addition, subtraction, multiplication, and division.

– Interval scales support addition and subtraction, but not meaningful multiplication or division.

– **Interpretation**:

– Zero on a ratio scale implies a complete lack of the measured attribute (e.g., 0 kg = no weight).

– Zero on an interval scale is a point of reference, not an indication of the absence of the attribute (e.g., 0°C = a specific temperature, not the absence of temperature).

### Conclusion

Your comment correctly distinguishes between true zero (as in weight) and arbitrary zero (as in temperature). Recognizing this difference is important for accurate data interpretation and appropriate application of statistical methods.

Thanks for information you provide. In the summary table there is a trivial mistake: Has a natural “order” property set True for nominal scale. It must be False

There is no natural order in “nominal” variables

Hey, I just found a little problem with your table – “Has natural order” is set as “YES” for Nominal, while it should be “NO”.

In this article last summary table,nominal scale having a natural order but it is not correct

This was the perfect clarification tool for my introduction to statistics study. It refined and clarified the main points from my textbook in an easy-to-understand manner. The way these scales were explained and then demonstrated with examples helped me to grasp the concepts I was struggling with while reading the text.

It is detail explanation. Interesting!

Zach,

You are awesome! thank you

This analysis does not mention an important class of data that is often encountered in ML. The cyclic class involves data that is part of a repeating pattern. Examples include:

Days of the week

Months of the year

Hours of the day

Wind direction

etc.

This type of data kind-of has rank but kind-of does not. Is Monday South? This type of data does have “exact difference between values” provided you employ a modulo or sinusoidal calculation. N is closer to NW than S is to E.

Try doing KNN modelling with day-of-the week data – you will soon see the need for special treatment for cyclical data.

Hi Nick…You’re absolutely right. Cyclical data requires special handling due to its repeating nature. Here are some strategies to properly encode cyclical data for machine learning models:

### Encoding Cyclical Data

1. **Sinusoidal Transformation**:

– **Concept**: Use sine and cosine transformations to capture the cyclical nature of the data. This ensures that the cyclical continuity is preserved, e.g., the proximity of “Monday” to both “Sunday” and “Tuesday”.

– **Formula**:

– For a cyclic variable \(x\) with \(T\) distinct values (e.g., 7 for days of the week):

\[

x_{sin} = \sin\left(\frac{2\pi x}{T}\right)

\]

\[

x_{cos} = \cos\left(\frac{2\pi x}{T}\right)

\]

– **Example**: Encoding days of the week.

– Monday (1): \( \sin\left(\frac{2\pi \cdot 1}{7}\right) \), \( \cos\left(\frac{2\pi \cdot 1}{7}\right) \)

– Sunday (0): \( \sin\left(\frac{2\pi \cdot 0}{7}\right) \), \( \cos\left(\frac{2\pi \cdot 0}{7}\right) \)

2. **Modulo Operation**:

– **Concept**: Use the modulo operation to map cyclical values to a consistent range, maintaining their order but ensuring continuity.

– **Example**: If using hour of the day (0-23), mapping 23 and 0 to be adjacent can be done using modulo 24 arithmetic.

3. **Using Periodic Functions**:

– **Concept**: Periodic functions can be used to handle the cyclical nature explicitly within the model.

– **Example**: Wind direction can be represented using:

\[

wind_{sin} = \sin\left(\frac{direction \cdot 2\pi}{360}\right)

\]

\[

wind_{cos} = \cos\left(\frac{direction \cdot 2\pi}{360}\right)

\]

### Applying to KNN

When using K-Nearest Neighbors (KNN) or other distance-based algorithms, these transformations help maintain the cyclic continuity and ensure that the proximity in the original cyclic space is preserved in the feature space.

**Example in KNN with Days of the Week**:

Suppose you want to use KNN to predict an outcome based on the day of the week. First, encode the day of the week using the sinusoidal transformation:

– Monday (1): \( \sin\left(\frac{2\pi \cdot 1}{7}\right) \approx 0.781 \), \( \cos\left(\frac{2\pi \cdot 1}{7}\right) \approx 0.623 \)

– Tuesday (2): \( \sin\left(\frac{2\pi \cdot 2}{7}\right) \approx 0.974 \), \( \cos\left(\frac{2\pi \cdot 2}{7}\right) \approx -0.222 \)

By transforming the cyclical data this way, you ensure that the model treats the beginning and end of the cycle as adjacent, preserving the inherent properties of the data.

### Resources for Further Reading

1. **”Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists” by Alice Zheng and Amanda Casari**: This book covers various feature engineering techniques, including handling cyclical data.

2. **”Machine Learning with Python Cookbook” by Chris Albon**: Offers practical recipes for handling different types of data, including cyclical data.

These methods are crucial for accurately representing cyclical data in machine learning models, ensuring that the models capture the true relationships within the data.

Hi Nick…Sorry the equations did not display properly. Let me know if have any specific questions.