# How to Perform a t-test with Unequal Sample Sizes

One question students often have in statistics is:

Is it possible to perform a t-test when the sample sizes of each group are not equal?

Yes, you can perform a t-test when the sample sizes are not equal. Equal sample sizes is not one of the assumptions made in a t-test.

The real issues arise when the two samples do not have equal variances, which is one of the assumptions made in a t-test.

When this occurs, it’s recommended that you use Welch’s t-test instead, which does not make the assumption of equal variances.

The following examples demonstrate how to perform t-tests with unequal sample sizes when the variances are equal and when they’re not equal.

## Example 1: Unequal Sample Sizes and Equal Variances

Suppose we administer two programs designed to help students score higher on some exam.

The results are as follows:

Program 1:

• n (sample size): 500
• x (sample mean): 80
• s (sample standard deviation): 5

Program 2:

• n (sample size): 20
• x (sample mean): 85
• s (sample standard deviation): 5

The following code shows how to create a boxplot in R to visualize the distribution of exam scores for each program:

```#make this example reproducible
set.seed(1)

#create vectors to hold exam scores
program1 <- rnorm(500, mean=80, sd=5)
program2 <- rnorm(20, mean=85, sd=5)

#create boxplots to visualize distribution of exam scores
boxplot(program1, program2, names=c("Program 1","Program 2"))
``` The mean exam score for Program 2 appears to be higher, but the variance of exam scores between the two programs is roughly equal.

The following code shows how to perform an independent samples t-test along with a Welch’s t-test:

```#perform independent samples t-test
t.test(program1, program2, var.equal=TRUE)

Two Sample t-test

data:  program1 and program2
t = -3.3348, df = 518, p-value = 0.0009148
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.111504 -1.580245
sample estimates:
mean of x mean of y
80.11322  83.95910

#perform Welch's two sample t-test
t.test(program1, program2, var.equal=FALSE)

Welch Two Sample t-test

data:  program1 and program2
t = -3.3735, df = 20.589, p-value = 0.00293
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.219551 -1.472199
sample estimates:
mean of x mean of y
80.11322  83.95910
```

The independent samples t-test returns a p-value of .0009 and Welch’s t-test returns a p-value of .0029.

Since the p-value of each test is less than .05, we would reject the null hypothesis in each test and conclude that there is a statistically significant difference in mean exam scores between the two programs.

Even though the sample sizes are unequal, the independent samples t-test and Welch’s t-test both return similar results since the two samples had equal variances.

## Example 2: Unequal Sample Sizes and Unequal Variances

Suppose we administer two programs designed to help students score higher on some exam.

The results are as follows:

Program 1:

• n (sample size): 500
• x (sample mean): 80
• s (sample standard deviation): 25

Program 2:

• n (sample size): 20
• x (sample mean): 85
• s (sample standard deviation): 5

The following code shows how to create a boxplot in R to visualize the distribution of exam scores for each program:

```#make this example reproducible
set.seed(1)

#create vectors to hold exam scores
program1 <- rnorm(500, mean=80, sd=25)
program2 <- rnorm(20, mean=85, sd=5)

#create boxplots to visualize distribution of exam scores
boxplot(program1, program2, names=c("Program 1","Program 2"))
``` The mean exam score for Program 2 appears to be higher, but the variance of exam scores for Program 1 is much higher than Program 2.

The following code shows how to perform an independent samples t-test along with a Welch’s t-test:

```#perform independent samples t-test
t.test(program1, program2, var.equal=TRUE)

Two Sample t-test

data:  program1 and program2
t = -0.5988, df = 518, p-value = 0.5496
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-14.52474   7.73875
sample estimates:
mean of x mean of y
80.5661   83.9591

#perform Welch's two sample t-test
t.test(program1, program2, var.equal=FALSE)

Welch Two Sample t-test

data:  program1 and program2
t = -2.1338, df = 74.934, p-value = 0.03613
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.560690 -0.225296
sample estimates:
mean of x mean of y
80.5661   83.9591
```

The independent samples t-test returns a p-value of .5496 and Welch’s t-test returns a p-value of .0361.

The independent samples t-test is not able to detect a difference in mean exam scores, but the Welch’s t-test is able to detect a statistically significant difference.

Since the two samples had unequal variances, only Welch’s t-test was able to detect the statistically significant difference in mean exam scores since this test does not make the assumption of equal variances between samples.