This tutorial explains how to find confidence intervals in R for means and proportions.

**What is a Confidence Interval?**

Suppose we want to know the mean height of a student at a school that has 1,000 students. Since it would take too long to measure the height of every student, we instead take a simple random sample of 100 students and calculate the mean height of students in this sample.

In this example, the mean height of students in the sample is our **statistic** and the actual mean height of students in the entire population is the **parameter** we are trying to estimate using our sample.

The sample mean is known as a **point estimate**, which is a single value used to estimate a population parameter. And while this value might be a good estimate of the true population mean, there is no guarantee that the sample mean is exactly equal to the population mean.

For example, the mean height of students in our sample might be 66 inches while the actual mean height of students at this school may be 68 inches. In other words, our point estimate doesn’t account for any uncertainty we may have.

One way to account for uncertainty is to use a **confidence interval**, which is a range of values that we believe contains the true population parameter.

A confidence interval consists of three parts:

A **sample statistic** (often this is a sample mean or sample proportion)

A **standard error** (sample standard deviation divided by sample size)

A **critical value** (determined by the confidence level we choose – the table below shows the critical values associated with various confidence levels)

The formula to create a confidence interval is:

**Confidence Interval** = **sample statistic** +/- (**critical value** * **standard error**)

**Confidence Interval for a Mean**

The formula to find a confidence interval for a population mean is:

x +/- t_{n-1} * (s / √n)

where **x** is the sample mean, **t _{n-1} **is the t critical-value that comes from the t distribution table with n-1 degrees of freedom,

**s**is the sample standard deviation, and

**n**is the sample size.

The following example illustrates how to find a confidence interval for a mean in R.

**Example**

We are interested in finding the mean height of tomato plants in a region. Since it would take too long to measure the height of every plant in the region, we decide to take a simple random sample of 20 plants and find that their heights (in inches) are as follows:

14, 17, 12, 12, 14, 13, 15, 16, 19, 22, 23, 21, 14, 16, 16, 17, 19, 13, 13, 14

Use this sample to construct a 95% confidence interval for the mean height of tomato plants in this region.

#create a vector of plant heights heights <- c(14, 17, 12, 12, 14, 13, 15, 16, 19, 22, 23, 21, 14, 16, 16, 17, 19, 13, 13, 14) #find sample mean (x), sample standard deviation (s), and sample size (n) x <- mean(heights) s <- sd(heights) n <- length(heights) #find critical value based on t-distribution with n-1 degrees of freedom crit_t <- qt(0.975,df=n-1) #find lower and upper limits of 95% confidence interval lower <- x - crit_t*(s/sqrt(n)) upper <- x + crit_t*(s/sqrt(n)) #output confidence interval CI <- c(lower, upper) CI #[1] 14.45895 17.54105

The 95% confidence interval for the mean height (in inches) of tomato plants in this region is **(14.45895, 17.54105)**.

**Confidence Interval for a Difference in Means**

The formula to find a confidence interval for the difference between two population means is:

(x_{1} – x_{2}) +/- (z critical value) * (√s^{2}_{1} / n_{1} + s^{2}_{2} / n_{2})

where** x _{1}** is the mean of sample 1,

**x**is the mean of sample 2,

_{2 }**s**is the variance of sample 1,

^{2}_{1}**n**is the sample size of sample 1,

_{1}**s**is the variance of sample 2, and

^{2}_{2}**n**is the sample size of sample 2.

_{2}The following example illustrates how to find a confidence interval for the difference between two population means in R.

**Example**

Researchers want to know whether a new diet helps people lose weight. 100 randomly assigned people are assigned to group 1 and put on the new diet. Another 100 randomly assigned people are assigned to group 2 and are kept on their normal diet. After three months, the mean weight loss for group 1 was 8 pounds with a standard deviation of 2 pounds and the mean weight loss for group 2 was 6 pounds with a standard deviation of 3 pounds.

Construct a 90% confidence interval for the difference in mean weight loss of group 1 – group 2.

#define mean, standard deviation, and sample size for each group x1 <- 8 x2 <- 6 s1 <- 2 s2 <- 3 n1 <- 100 n2 <- 100 #find difference between means mean_diff <- x1 - x2 #find critical value based on normal distibution crit_z <- qnorm(.95) #find standard deviation of the difference between the two means sd <- sqrt( ((s1^2)/n1) + ((s2^2)/n2) ) #find lower and upper limits of 90% confidence interval lower <- mean_diff - crit_z*sd upper <- mean_diff + crit_z*sd #output confidence interval CI <- c(lower, upper) CI #[1] 1.40694 2.59306

The 90% confidence interval for the difference in mean weight loss (in lbs) of group 1 – group 2 is **(1.40694, 2.59306)**.

**Confidence Interval for a Proportion**

The formula to find a confidence interval for a population proportion is:

p +/- (z critical value) * (√p(1-p) / n)

where **p** is the sample proportion and **n** is the sample size.

The following example illustrates how to find a confidence interval for a population proportion in R.

**Example**

There are 500 students at a certain school. The principal of the school wants to estimate what proportion of all students prefer chocolate milk over regular milk in the cafeteria. He takes a simple random sample of 50 students and finds that 20 of the students prefer chocolate milk.

Based on this sample, find a 99% confidence interval for the proportion of students at this school who prefer chocolate milk over regular milk.

#define sample proportion and sample size p <- 20/50 n <- 50 #define z critical value to use crit_z <- qnorm(.995) #find sample standard deviation s <- sqrt(p*(1-p)/n) #find lower and upper limits of 99% confidence interval lower <- p - crit_z*s upper <- p + crit_z*s #output confidence interval CI <- c(lower, upper) CI [1] 0.2215413 0.5784587

The 99% confidence interval for the proportion of students at this school who prefer chocolate milk over regular milk is** (.2215, .5784)**.

**Confidence Interval for a Difference in Proportions**

The formula to find a confidence interval for the difference between two population proportions is:

(p_{1} – p_{2}) +/- (z critical value) * (√p_{1}(1-p_{1}) / n_{1} + p_{2}(1-p_{2}) / n_{2})

where **p _{1}** is the proportion of sample 1,

**p**

_{2 }is the proportion of sample 2,

**n**is the sample size of sample 1, and

_{1}**n**is the sample size of sample 2.

_{2}The following example illustrates how to find a confidence interval for the difference between two population proportions in R.

**Example**

A researcher wants to know what percentage of students at two different universities study for more than one hour per night. He takes a simple random sample of 100 students from each school and finds that 40% of students at school 1 study for more than one hour per night and 35% of students at school 2 study for more than one hour per night.

Construct a 95% confidence interval for the difference between the proportion of students who study for more than one hour per night at these two universities.

#define sample proportions and sample sizes p1 <- .40 p2 <- .35 n1 <- 100 n2 <- 100 #find difference in proportions p <- p1 - p2 #define z critical value to use crit_z <- qnorm(.975) #find sample standard deviation s <- sqrt( (p1*(1-p1)/n1) + (p2*(1-p2)/n2) ) #find lower and upper limits of 95% confidence interval lower <- p - crit_z*s upper <- p + crit_z*s #output confidence interval CI <- c(lower, upper) CI #[1] -0.08401052 0.18401052

The 95% confidence interval for the difference between the proportion of students who study for more than one hour per night at these two universities is **(-.084, .184)**.