How to Find Confidence Intervals in R (With Examples)


A confidence interval is a range of values that is likely to contain a population parameter with a certain level of confidence.

It is calculated using the following general formula:

Confidence Interval = (point estimate)  +/-  (critical value)*(standard error)

This formula creates an interval with a lower bound and an upper bound, which likely contains a population parameter with a certain level of confidence:

Confidence Interval  = [lower bound, upper bound]

This tutorial explains how to calculate the following confidence intervals in R:

1. Confidence Interval for a Mean

2. Confidence Interval for a Difference in Means

3. Confidence Interval for a Proportion

4. Confidence Interval for a Difference in Proportions

Let’s jump in!

Example 1: Confidence Interval for a Mean

We use the following formula to calculate a confidence interval for a mean:

Confidence Interval = x  +/-  tn-1, 1-α/2*(s/√n)

where:

  • xsample mean
  • t: the t-critical value
  • s: sample standard deviation
  • n: sample size

Example: Suppose we collect a random sample of turtles with the following information:

  • Sample size n = 25
  • Sample mean weight x = 300
  • Sample standard deviation s = 18.5

The following code shows how to calculate a 95% confidence interval for the true population mean weight of turtles:

#input sample size, sample mean, and sample standard deviation
n <- 25
xbar <- 300 
s <- 18.5

#calculate margin of error
margin <- qt(0.975,df=n-1)*s/sqrt(n)

#calculate lower and upper bounds of confidence interval
low <- xbar - margin
low

[1] 292.3636

high <- xbar + margin
high

[1] 307.6364

The 95% confidence interval for the true population mean weight of turtles is [292.36, 307.64].

Example 2: Confidence Interval for a Difference in Means

We use the following formula to calculate a confidence interval for a difference in population means:

Confidence interval = (x1x2) +/- t*√((sp2/n1) + (sp2/n2))

where:

  • x1x2: sample 1 mean, sample 2 mean
  • t: the t-critical value based on the confidence level and (n1+n2-2) degrees of freedom
  • sp2: pooled variance, calculated as ((n1-1)s12 + (n2-1)s22) / (n1+n2-2)
  • t: the t-critical value
  • n1, n2: sample 1 size, sample 2 size

Example: Suppose we want to estimate the difference in mean weight between two different species of turtles, so we go out and gather a random sample of 15 turtles from each population. Here is the summary data for each sample:

Sample 1:

  • x1 = 310
  • s1 = 18.5
  • n1 = 15

Sample 2:

  • x2 = 300
  • s2 = 16.4
  • n2 = 15

The following code shows how to calculate a 95% confidence interval for the true difference in population means:

#input sample size, sample mean, and sample standard deviation
n1 <- 15
xbar1 <- 310 
s1 <- 18.5

n2 <- 15
xbar2 <- 300
s2 <- 16.4

#calculate pooled variance
sp = ((n1-1)*s1^2 + (n2-1)*s2^2) / (n1+n2-2)

#calculate margin of error
margin <- qt(0.975,df=n1+n2-1)*sqrt(sp/n1 + sp/n2)

#calculate lower and upper bounds of confidence interval
low <- (xbar1-xbar2) - margin
low

[1] -3.055445

high <- (xbar1-xbar2) + margin
high

[1] 23.05544

The 95% confidence interval for the true difference in population means is [-3.06, 23.06].

Example 3: Confidence Interval for a Proportion

We use the following formula to calculate a confidence interval for a proportion:

Confidence Interval = p  +/-  z*(√p(1-p) / n)

where:

  • p: sample proportion
  • z: the chosen z-value
  • n: sample size

Example: Suppose we want to estimate the proportion of residents in a county that are in favor of a certain law. We select a random sample of 100 residents and ask them about their stance on the law. Here are the results:

  • Sample size n = 100
  • Proportion in favor of law p = 0.56

The following code shows how to calculate a 95% confidence interval for the true proportion of residents in the entire county who are in favor of the law:

#input sample size and sample proportion
n <- 100
p <- .56

#calculate margin of error
margin <- qnorm(0.975)*sqrt(p*(1-p)/n)

#calculate lower and upper bounds of confidence interval
low <- p - margin
low

[1] 0.4627099

high <- p + margin
high

[1] 0.6572901

The 95% confidence interval for the true proportion of residents in the entire county who are in favor of the law is [.463, .657].

Example 4: Confidence Interval for a Difference in Proportions

We use the following formula to calculate a confidence interval for a difference in proportions:

Confidence interval = (p1–p2)  +/-  z*√(p1(1-p1)/n+ p2(1-p2)/n2)

where:

  • p1, p2: sample 1 proportion, sample 2 proportion
  • z: the z-critical value based on the confidence level
  • n1, n2: sample 1 size, sample 2 size

Example: Suppose we want to estimate the difference in the proportion of residents who support a certain law in county A compared to the proportion who support the law in county B. Here is the summary data for each sample:

Sample 1:

  • n1 = 100
  • p1 = 0.62 (i.e. 62 out of 100 residents support the law)

Sample 2:

  • n2 = 100
  • p2 = 0.46 (i.e. 46 our of 100 residents support the law)

The following code shows how to calculate a 95% confidence interval for the true difference in proportion of residents who support the law between the counties:

#input sample sizes and sample proportions
n1 <- 100
p1 <- .62

n2 <- 100
p2 <- .46

#calculate margin of error
margin <- qnorm(0.975)*sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)

#calculate lower and upper bounds of confidence interval
low <- (p1-p2) - margin
low

[1] 0.02364509


high <- (p1-p2) + margin
high

[1] 0.2963549

The 95% confidence interval for the true difference in proportion of residents who support the law between the counties is [.024, .296].

You can find more R tutorials here.

Leave a Reply

Your email address will not be published.