How to Calculate Conditional Probability in R


The conditional probability that event A occurs, given that event B has occurred, is calculated as follows:

P(A|B) = P(A∩B) / P(B)

where:

P(A∩B) = the probability that event and event both occur. 

P(B) = the probability that event B occurs.

The following examples show how to use this formula to calculate conditional probabilities in R.

Example 1: Calculate Conditional Probability Using Values

Suppose we send out a survey to 300 individuals asking them which sport they like best: baseball, basketball, football, or soccer.

Suppose we know that the probability that an individual is male and prefers baseball as their favorite sport is 0.113.

Suppose we also know that the probability that any individual prefers baseball as their favorite sport is 0.226.

Given that an individual prefers baseball, we could calculate the probability that they’re male to be:

  • P(Male|Prefers Baseball) = P(Male∩Prefers Baseball) / P(Prefers Baseball)
  • P(Male|Prefers Baseball) = 0.113 / 0.226
  • P(Male|Prefers Baseball) = 0.5

Given that an individual prefers baseball, the probability that they’re male is 0.5.

Here’s how we can calculate this probability in R:

#define probability of being male and preferring baseball
p_male_baseball <- 0.113

#define probability of preferring baseball
p_baseball <- 0.226

#calculate probability of being male, given that individual prefers baseball
p_male_baseball / p_baseball

[1] 0.5

Example 2: Calculate Conditional Probability Using a Table

Suppose we send out a survey to 300 individuals asking them which sport they like best: baseball, basketball, football, or soccer.

We can create the following table in R to hold the survey responses:

#create data frame to hold survey responses
df <- data.frame(gender=rep(c('Male', 'Female'), each=150),
                 sport=rep(c('Baseball', 'Basketball', 'Football', 'Soccer',
                             'Baseball', 'Basketball', 'Football', 'Soccer'),
                              times=c(34, 40, 58, 18, 34, 52, 20, 44)))

#create two-way table from data frame
survey_data <- addmargins(table(df$gender, df$sport))

#view table
survey_data

         Baseball Basketball Football Soccer  Sum
  Female       34         52       20     44  150
  Male         34         40       58     18  150
  Sum          68         92       78     62  300

We can use the following syntax to extract values from the table:

#extract value in second row and first column 
survey_data[2, 1]

[1] 34

We can use the following syntax to calculate the probability that an individual is male, given that they prefer baseball as their favorite sport:

#calculate probability of being male, given that individual prefers baseball
survey_data[2, 1] / survey_data[3, 1]

[1] 0.5

And we can use the following syntax to calculate the probability that an individual prefers basketball as their favorite sport, given that they’re female:

#calculate probability of preferring basketball, given that individual is female
survey_data[1, 2] / survey_data[1, 5]

[1] 0.3466667

We can use this basic approach to calculate any conditional probability we’d like from the table.

Additional Resources

The following tutorials provide additional information on dealing with probability:

Law of Total Probability
How to Find the Mean of a Probability Distribution
How to Find the Standard Deviation of a Probability Distribution

Leave a Reply

Your email address will not be published.