**Standardization** and **normalization** are two ways to rescale data.

**Standardization** rescales a dataset to have a mean of 0 and a standard deviation of 1. It uses the following formula to do so:

**x _{new} = (x_{i} – x) / s**

where:

**x**: The i_{i}^{th}value in the dataset**x**: The sample mean**s**: The sample standard deviation

**Normalization** rescales a dataset so that each value falls between 0 and 1. It uses the following formula to do so:

**x _{new} = (x_{i} – x_{min}) / (x_{max} – x_{min})**

where:

**x**: The i_{i}^{th}value in the dataset**x**: The minimum value in the dataset_{min}**x**: The maximum value in the dataset_{max}

The following examples show how to standardize and normalize a dataset in practice.

**Example: How to Standardize Data**

Suppose we have the following dataset:

The mean value in the dataset is 43.15 and the standard deviation is 22.13.

To normalize the first value of **13**, we would apply the formula shared earlier:

**x**= (13 – 43.15) / 22.13 =_{new}= (x_{i}– x) / s**-1.36**

To normalize the second value of **16**, we would use the same formula:

**x**= (16 – 43.15) / 22.13 =_{new}= (x_{i}– x) / s**-1.23**

To normalize the third value of**19**, we would use the same formula:

**x**= (19 – 43.15) / 22.13 =_{new}= (x_{i}– x) / s**-1.09**

We can use this exact same formula to standardize each value in the original dataset:

**Example: How to Normalize Data**

Once again, suppose we have the following dataset:

The minimum value in the dataset is 13 and the maximum value is 71.

To normalize the first value of **13**, we would apply the formula shared earlier:

**x**= (13 – 13) / (71 – 13) =_{new}= (x_{i}– x_{min}) / (x_{max}– x_{min})**0**

To normalize the second value of **16**, we would use the same formula:

**x**= (16 – 13) / (71 – 13) =_{new}= (x_{i}– x_{min}) / (x_{max}– x_{min})**.0517**

To normalize the third value of**19**, we would use the same formula:

**x**= (19 – 13) / (71 – 13) =_{new}= (x_{i}– x_{min}) / (x_{max}– x_{min})**.1034**

We can use this exact same formula to normalize each value in the original dataset to be between 0 and 1:

**Standardization vs. Normalization: When to Use Each**

Typically we **normalize** data when performing some type of analysis in which we have multiple variables that are measured on different scales and we want each of the variables to have the same range.

This prevents one variable from being overly influential, especially if it’s measured in different units (i.e. if one variable is measured in inches and another is measured in yards).

On the other hand, we typically **standardize** data when we’d like to know how many standard deviations each value in a dataset lies from the mean.

For example, we might have a list of exam scores for 500 students at a particular school and we’d like to know how many standard deviations each exam score lies from the mean score.

In this case, we could standardize the raw data to find out this information. Then, a standardized score of 1.26 would tell us that the exam score of that particular student lies 1.26 standard deviations above the mean exam score.

Whether you decide to normalize or standardize your data, keep the following in mind:

- A
**normalized dataset**will always have values that range between 0 and 1. - A
**standardized dataset**will have a mean of 0 and standard deviation of 1, but there is no specific upper or lower bound for the maximum and minimum values.

Depending on your particular scenario, it may make more sense to normalize or standardize the data.

**Additional Resources**

The following tutorials explain how to standardized and normalize data in different statistical software:

How to Normalize Data in R

How to Normalize Data in Excel

How to Normalize Data in Python

How to Standardize Data in R

Tsk tsk. You show an example of standardization but you use the term “normalize”. Following that is an example of normalization which you then call normalization. Aside from that mislabeling of the standardization example I found this to be a clear explanation. Thank you.

Hi Steve…Thank you for pointing out the terminology issue. It’s important to use the correct terms to avoid confusion. Here’s the corrected explanation with proper terminology for standardization and normalization, along with the appropriate example code.

### Standardization vs. Normalization

– **Standardization**: This process rescales the data to have a mean of 0 and a standard deviation of 1. It is also known as Z-score normalization.

– **Normalization**: This process rescales the data to a fixed range, typically [0, 1].

### Corrected Example for Standardization

In this example, we will standardize the data:

“`python

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Dropout

# Load your dataset

dt = pd.read_excel(“/content/DATA.xlsx”)

dt.head()

# Split the data into features (X) and target (Y)

X = dt.iloc[:, 0:20]

Y = dt.iloc[:, 21]

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.15, random_state=100)

# Standardize the data

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

# Define the neural network model

model = Sequential([

Dense(32, activation=’relu’, input_shape=(X_train.shape[1],)),

Dropout(0.5),

Dense(16, activation=’relu’),

Dropout(0.5),

Dense(1, activation=’sigmoid’)

])

# Compile the model

model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’])

# Train the model

history = model.fit(X_train_scaled, y_train, epochs=250, batch_size=10, validation_split=0.2)

# Evaluate the model on the test data

loss, accuracy = model.evaluate(X_test_scaled, y_test)

print(‘Test Loss:’, loss)

print(‘Test Accuracy:’, accuracy)

# Make predictions on new data

Xpred = pd.read_excel(‘New_data.xlsx’)

Xpred_scaled = scaler.transform(Xpred) # Standardize the new data

predicted_probabilities = model.predict(Xpred_scaled)

predicted_classes = (predicted_probabilities > 0.5).astype(int)

# Add predictions to the dataframe

Xpred[‘yes/no’] = predicted_classes

Xpred[‘prob’] = predicted_probabilities

# Save the results to a new Excel file

Xpred.to_excel(‘New_prob.xlsx’, index=False)

# Display the first 50 rows of the new dataframe with predictions

print(Xpred.head(50))

“`

### Corrected Example for Normalization

In this example, we will normalize the data to the range [0, 1]:

“`python

from sklearn.preprocessing import MinMaxScaler

# Normalize the data

scaler = MinMaxScaler()

X_train_normalized = scaler.fit_transform(X_train)

X_test_normalized = scaler.transform(X_test)

# Define the neural network model

model = Sequential([

Dense(32, activation=’relu’, input_shape=(X_train.shape[1],)),

Dropout(0.5),

Dense(16, activation=’relu’),

Dropout(0.5),

Dense(1, activation=’sigmoid’)

])

# Compile the model

model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’])

# Train the model

history = model.fit(X_train_normalized, y_train, epochs=250, batch_size=10, validation_split=0.2)

# Evaluate the model on the test data

loss, accuracy = model.evaluate(X_test_normalized, y_test)

print(‘Test Loss:’, loss)

print(‘Test Accuracy:’, accuracy)

# Make predictions on new data

Xpred = pd.read_excel(‘New_data.xlsx’)

Xpred_normalized = scaler.transform(Xpred) # Normalize the new data

predicted_probabilities = model.predict(Xpred_normalized)

predicted_classes = (predicted_probabilities > 0.5).astype(int)

# Add predictions to the dataframe

Xpred[‘yes/no’] = predicted_classes

Xpred[‘prob’] = predicted_probabilities

# Save the results to a new Excel file

Xpred.to_excel(‘New_prob.xlsx’, index=False)

# Display the first 50 rows of the new dataframe with predictions

print(Xpred.head(50))

“`

### Summary of Corrections

– **Standardization**: Adjusts data to have a mean of 0 and a standard deviation of 1.

– **Normalization**: Adjusts data to a specific range, typically [0, 1].

Using the correct terminology ensures clarity and better understanding of the preprocessing steps. Thank you for highlighting the need for these corrections.