# A quick introduction to statistics

## What is Statistics?

Statistics is the field that analyzes and interprets data.

Data is just a bunch of numbers.

Here’s an example of some data on the height of basketball players:

Here’s another example of some data on the population of different U.S. cities:

When we organize data into rows and columns, we call it a dataset.

Datasets come in all shapes and sizes.

The dataset above on city populations only has two columns and ten rows, but some datasets have thousands of columns and millions of rows.

## The Two Branches of Statistics

There are two branches in the field of statistics:

### 1. Descriptive statistics

The first branch of statistics is descriptive statistics.

We use descriptive statistics to describe a dataset.

Suppose we have this dataset on the square footage of houses in your neighborhood:

One descriptive statistic we could use to describe this dataset is the mean (the average) home size. It turns out that the mean home size is 1,845 square feet.

Another descriptive statistic we could use is the range (the difference in size between the largest and smallest home ), which turns out to be 1,520 square feet.

These are two examples of descriptive statistics. They both describe our dataset in different ways and help us gain a better understanding of the underlying data.

Instead of staring at a long list of home sizes, by knowing the mean home size we can understand how large the average home is in this neighborhood. And by knowing the range, we can understand how big of a difference there is between the largest and smallest home in this neighborhood.

### 2. Inferential statistics

The second branch of statistics is inferential statistics.

This branch of statistics uses samples to make inferences about populations.

A population is a large set of data that we’re interested in studying.

A sample is just a tiny piece of the population that we use to draw inferences about the population.

Example 1: We want to know the average height of a student in a particular school that has 1,000 students. Since it would take a long time to measure the height of each student, we decide to randomly measure 100 students and use the average of these 100 to estimate the average of all 1,000 students.

The 1,000 students represent the population we are interested in.

The 100 students we randomly chose to measure represent the sample we use to draw inferences about the population.

Example 2: We want to know what percentage of people support a new law  in a certain city with a population of 50,000 people. Since it’s unreasonable to call every single person, we instead randomly call 500 people and ask whether or not they support the new law. We use the percentage of people in this sample of 500 who say they support the new law to estimate what percentage of the total 50,000 people in the city support the new law.

The 50,000 people represents the population we are interested in.

The 500 people we randomly call represent the sample we use to draw inferences about the population.

Why Use Samples?

We use samples to draw inferences about populations because it’s too time consuming and expensive to gather data on entire populations. It’s easier, faster, and cheaper to gather data on a small sample of the population and use that sample to make inferences on the larger population.

## What is the Point of Learning Statistics?

Data exists on pretty much everything: the height of sports athletes, the size of houses, the number of crimes in an area, the temperatures in a given city over time, the size of populations in different countries, tests scores on standardized tests, the number of flu cases each year, etc.

Descriptive statistics help us summarize this data, which helps us understand trends and patterns.

Inferential statistics let us use samples to draw inferences about larger populations.

In essence, statistics is the field that helps us understand data, which helps us better understand the world around us.