A confidence interval is a range of values that is likely to contain a population parameter with a certain level of confidence.

It is calculated using the following general formula:

**Confidence Interval** = (point estimate) +/- (critical value)*(standard error)

This formula creates an interval with a lower bound and an upper bound, which likely contains a population parameter with a certain level of confidence:

**Confidence Interval ** = [lower bound, upper bound]

This tutorial explains how to calculate the following confidence intervals in R:

**1.** Confidence Interval for a Mean

**2.** Confidence Interval for a Difference in Means

**3.** Confidence Interval for a Proportion

**4.** Confidence Interval for a Difference in Proportions

Let’s jump in!

**Example 1: Confidence Interval for a Mean**

We use the following formula to calculate a confidence interval for a mean:

**Confidence Interval = x +/- t _{n-1, 1-α/2}*(s/√n)**

where:

**x:**sample mean**t:**the t-critical value**s:**sample standard deviation**n:**sample size

**Example: **Suppose we collect a random sample of turtles with the following information:

- Sample size
**n = 25** - Sample mean weight
**x = 300** - Sample standard deviation
**s = 18.5**

The following code shows how to calculate a 95% confidence interval for the true population mean weight of turtles:

#input sample size, sample mean, and sample standard deviation n <- 25 xbar <- 300 s <- 18.5 #calculate margin of error margin <- qt(0.975,df=n-1)*s/sqrt(n) #calculate lower and upper bounds of confidence interval low <- xbar - margin low [1] 292.3636 high <- xbar + margin high [1] 307.6364

The 95% confidence interval for the true population mean weight of turtles is **[292.36, 307.64]**.

**Example 2: Confidence Interval for a Difference in Means**

We use the following formula to calculate a confidence interval for a difference in population means:

**Confidence interval** = (x_{1}–x_{2}) +/- t*√((s_{p}^{2}/n_{1}) + (s_{p}^{2}/n_{2}))

where:

- x
_{1}, x_{2}: sample 1 mean, sample 2 mean - t: the t-critical value based on the confidence level and (n
_{1}+n_{2}-2) degrees of freedom - s
_{p}^{2}: pooled variance, calculated as ((n_{1}-1)s_{1}^{2}+ (n_{2}-1)s_{2}^{2}) / (n_{1}+n_{2}-2) - t: the t-critical value
- n
_{1}, n_{2}: sample 1 size, sample 2 size

**Example: **Suppose we want to estimate the difference in mean weight between two different species of turtles, so we go out and gather a random sample of 15 turtles from each population. Here is the summary data for each sample:

**Sample 1:**

- x
_{1}= 310 - s
_{1}= 18.5 - n
_{1}= 15

**Sample 2:**

- x
_{2}= 300 - s
_{2}= 16.4 - n
_{2}= 15

The following code shows how to calculate a 95% confidence interval for the true difference in population means:

#input sample size, sample mean, and sample standard deviation n1 <- 15 xbar1 <- 310 s1 <- 18.5 n2 <- 15 xbar2 <- 300 s2 <- 16.4 #calculate pooled variance sp = ((n1-1)*s1^2 + (n2-1)*s2^2) / (n1+n2-2) #calculate margin of error margin <- qt(0.975,df=n1+n2-1)*sqrt(sp/n1 + sp/n2) #calculate lower and upper bounds of confidence interval low <- (xbar1-xbar2) - margin low [1] -3.055445 high <- (xbar1-xbar2) + margin high [1] 23.05544

The 95% confidence interval for the true difference in population means is **[-3.06, 23.06]**.

**Example 3: Confidence Interval for a Proportion**

We use the following formula to calculate a confidence interval for a proportion:

**Confidence Interval = p**** +/- z*(√p(1-p) / n)**

where:

**p:**sample proportion**z:**the chosen z-value**n:**sample size

**Example: **Suppose we want to estimate the proportion of residents in a county that are in favor of a certain law. We select a random sample of 100 residents and ask them about their stance on the law. Here are the results:

- Sample size
**n = 100** - Proportion in favor of law
**p = 0.56**

The following code shows how to calculate a 95% confidence interval for the true proportion of residents in the entire county who are in favor of the law:

#input sample size and sample proportion n <- 100 p <- .56 #calculate margin of error margin <- qnorm(0.975)*sqrt(p*(1-p)/n) #calculate lower and upper bounds of confidence interval low <- p - margin low [1] 0.4627099 high <- p + margin high [1] 0.6572901

The 95% confidence interval for the true proportion of residents in the entire county who are in favor of the law is **[.463, .657]**.

**Example 4: Confidence Interval for a Difference in Proportions**

We use the following formula to calculate a confidence interval for a difference in proportions:

**Confidence interval = (p _{1}–p_{2}) +/- z*√(p_{1}(1-p_{1})/n_{1 }+ p_{2}(1-p_{2})/n_{2})**

where:

- p
_{1}, p_{2}: sample 1 proportion, sample 2 proportion - z: the z-critical value based on the confidence level
- n
_{1}, n_{2}: sample 1 size, sample 2 size

**Example: **Suppose we want to estimate the difference in the proportion of residents who support a certain law in county A compared to the proportion who support the law in county B. Here is the summary data for each sample:

**Sample 1:**

- n
_{1}= 100 - p
_{1}= 0.62 (i.e. 62 out of 100 residents support the law)

**Sample 2:**

- n
_{2}= 100 - p
_{2}= 0.46 (i.e. 46 our of 100 residents support the law)

The following code shows how to calculate a 95% confidence interval for the true difference in proportion of residents who support the law between the counties:

#input sample sizes and sample proportions n1 <- 100 p1 <- .62 n2 <- 100 p2 <- .46 #calculate margin of error margin <- qnorm(0.975)*sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2) #calculate lower and upper bounds of confidence interval low <- (p1-p2) - margin low [1] 0.02364509 high <- (p1-p2) + margin high [1] 0.2963549

The 95% confidence interval for the true difference in proportion of residents who support the law between the counties is **[.024, .296]**.

*You can find more R tutorials here.*