A confidence interval is a range of values that is likely to contain a population parameter with a certain level of confidence.
It is calculated using the following general formula:
Confidence Interval = (point estimate) +/- (critical value)*(standard error)
This formula creates an interval with a lower bound and an upper bound, which likely contains a population parameter with a certain level of confidence:
Confidence Interval = [lower bound, upper bound]
This tutorial explains how to calculate the following confidence intervals in R:
1. Confidence Interval for a Mean
2. Confidence Interval for a Difference in Means
3. Confidence Interval for a Proportion
4. Confidence Interval for a Difference in Proportions
Let’s jump in!
Example 1: Confidence Interval for a Mean
We use the following formula to calculate a confidence interval for a mean:
Confidence Interval = x +/- tn-1, 1-α/2*(s/√n)
where:
- x: sample mean
- t: the t-critical value
- s: sample standard deviation
- n: sample size
Example: Suppose we collect a random sample of turtles with the following information:
- Sample size n = 25
- Sample mean weight x = 300
- Sample standard deviation s = 18.5
The following code shows how to calculate a 95% confidence interval for the true population mean weight of turtles:
#input sample size, sample mean, and sample standard deviation n <- 25 xbar <- 300 s <- 18.5 #calculate margin of error margin <- qt(0.975,df=n-1)*s/sqrt(n) #calculate lower and upper bounds of confidence interval low <- xbar - margin low [1] 292.3636 high <- xbar + margin high [1] 307.6364
The 95% confidence interval for the true population mean weight of turtles is [292.36, 307.64].
Example 2: Confidence Interval for a Difference in Means
We use the following formula to calculate a confidence interval for a difference in population means:
Confidence interval = (x1–x2) +/- t*√((sp2/n1) + (sp2/n2))
where:
- x1, x2: sample 1 mean, sample 2 mean
- t: the t-critical value based on the confidence level and (n1+n2-2) degrees of freedom
- sp2: pooled variance, calculated as ((n1-1)s12 + (n2-1)s22) / (n1+n2-2)
- t: the t-critical value
- n1, n2: sample 1 size, sample 2 size
Example: Suppose we want to estimate the difference in mean weight between two different species of turtles, so we go out and gather a random sample of 15 turtles from each population. Here is the summary data for each sample:
Sample 1:
- x1 = 310
- s1 = 18.5
- n1 = 15
Sample 2:
- x2 = 300
- s2 = 16.4
- n2 = 15
The following code shows how to calculate a 95% confidence interval for the true difference in population means:
#input sample size, sample mean, and sample standard deviation n1 <- 15 xbar1 <- 310 s1 <- 18.5 n2 <- 15 xbar2 <- 300 s2 <- 16.4 #calculate pooled variance sp = ((n1-1)*s1^2 + (n2-1)*s2^2) / (n1+n2-2) #calculate margin of error margin <- qt(0.975,df=n1+n2-1)*sqrt(sp/n1 + sp/n2) #calculate lower and upper bounds of confidence interval low <- (xbar1-xbar2) - margin low [1] -3.055445 high <- (xbar1-xbar2) + margin high [1] 23.05544
The 95% confidence interval for the true difference in population means is [-3.06, 23.06].
Example 3: Confidence Interval for a Proportion
We use the following formula to calculate a confidence interval for a proportion:
Confidence Interval = p +/- z*(√p(1-p) / n)
where:
- p: sample proportion
- z: the chosen z-value
- n: sample size
Example: Suppose we want to estimate the proportion of residents in a county that are in favor of a certain law. We select a random sample of 100 residents and ask them about their stance on the law. Here are the results:
- Sample size n = 100
- Proportion in favor of law p = 0.56
The following code shows how to calculate a 95% confidence interval for the true proportion of residents in the entire county who are in favor of the law:
#input sample size and sample proportion n <- 100 p <- .56 #calculate margin of error margin <- qnorm(0.975)*sqrt(p*(1-p)/n) #calculate lower and upper bounds of confidence interval low <- p - margin low [1] 0.4627099 high <- p + margin high [1] 0.6572901
The 95% confidence interval for the true proportion of residents in the entire county who are in favor of the law is [.463, .657].
Example 4: Confidence Interval for a Difference in Proportions
We use the following formula to calculate a confidence interval for a difference in proportions:
Confidence interval = (p1–p2) +/- z*√(p1(1-p1)/n1 + p2(1-p2)/n2)
where:
- p1, p2: sample 1 proportion, sample 2 proportion
- z: the z-critical value based on the confidence level
- n1, n2: sample 1 size, sample 2 size
Example: Suppose we want to estimate the difference in the proportion of residents who support a certain law in county A compared to the proportion who support the law in county B. Here is the summary data for each sample:
Sample 1:
- n1 = 100
- p1 = 0.62 (i.e. 62 out of 100 residents support the law)
Sample 2:
- n2 = 100
- p2 = 0.46 (i.e. 46 our of 100 residents support the law)
The following code shows how to calculate a 95% confidence interval for the true difference in proportion of residents who support the law between the counties:
#input sample sizes and sample proportions n1 <- 100 p1 <- .62 n2 <- 100 p2 <- .46 #calculate margin of error margin <- qnorm(0.975)*sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2) #calculate lower and upper bounds of confidence interval low <- (p1-p2) - margin low [1] 0.02364509 high <- (p1-p2) + margin high [1] 0.2963549
The 95% confidence interval for the true difference in proportion of residents who support the law between the counties is [.024, .296].
You can find more R tutorials here.