Simulating Probability in R: The Classic Monty Hall Problem

Image by Arek Socha from Pixabay 

Introduction

Over the course of this post, we’re going to learn about using simulation to understand probability and we’ll use the classic example of the Monty Hall gameshow problem.

Monty Hall had a gameshow back in the day, where he showcased the following problem. If you’re not familiar with him or they game it was also referenced in 2008’s 21. Anyways, if you’re not familiar here’s the problem.

The Monty Hall Problem

He would give his contestant three doors to choose from. Behind two of the doors were goats… but behind 1 of them was a sports car.

So let’s say it’s you on the show. Monty asks you to choose a door, and if you’re right you get the car.

Now you ask yourself, what is the likelihood that any given door has the car behind it?

You and I both know you have a 33.3% chance of getting the car regardless of which door you choose.

Because it doesn’t matter, let’s say you choose door number 1. but before he reveals what’s behind, he stops you and reveals a goat sitting behind one of the two remaining doors. (He always knows which of the two has the goat if one of them indeed has the car)

He now asks you, would you like to stick with your original choice, or switch to the other unopened door?

The typical thinking is that it still doesn’t matter as you now have a 50/50 chance of selecting the door with the car behind it; however the reality of the situation is there’s actually a 66.7% chance that the car is behind the other door. The probability assigned to the two other doors now condenses to a single door now that you know that the car is not behind one of them.

Conceptually, it’s kind of a hard thing to wrap your mind around and we won’t spend more time trying to convince you why conceptually. From here, we are going to simulate this exact game and observe the probabilities across different scenarios.

Let’s convert this game into some code that we can re-run thousands of times to determine likelihood.

Let’s start to code

First things first. Let’s think about the different variables or placeholders that we’ll need to systematically code out this problem.

We’ll need to include the following:

  • doors
  • winning door
  • first choice
  • revealed door
  • final choice

Doors

Ok let’s get to work creating a vector to represent the doors.

doors <- c(1,2,3)

Easy enough.. vector with the three different choices.

Winning Door

Now let’s randomly select which door has the car. The sample function will pull n (size) of whatever vector you pass it. In this case it will pull either a 1, 2, 3 from doors.

winning_door <- sample(x = doors, size = 1)

First Choice

As I mentioned a moment ago, it’s time for you to choose which door. If you wanted to actually play this game, you could set it up like this to ask which of the three doors you’d want.

which_door <- function(){
  door <- readline("1, 2, or 3?")  
  return(as.numeric(door))
}
first_choice <- which_door()

but we are going to be repeating these steps over and over and over again in a for loop. So instead of forcing a manual input each time, we’re just going to declare first choice as door number 1.

first_choice <- 1

Revealed Door

We now need to determine which of the two remaining doors our program will reveal. We know that Monty would choose one of the two remaining doors.

Let’s think about the cases:

  • in the case that you’ve selected the car, both remaining doors will have goats behind them– so it doesn’t matter
  • if you’ve selected a goat, you just need to ensure you select the door that is not equal to the winning door which you will have just sampled.

We will write this out as an if statement.

So as we just discussed, if the winning door is the first choice, sample the doors vector after removing first_choice. If that is not the case, then remove from the doors vector both the winning_door and the first_choice and the only remaining option will be the remaining goat.

if (winning_door == first_choice){
    revealed_door <- sample(x = doors[-c(first_choice)], size = 1)
  } else {
    revealed_door <- doors[-c(first_choice, winning_door)]
  }

Final Choice

Similar to what we just did, let’s assume that in every case we are going to be stubborn and stick with our door. We are going to resist wisdom and go for it.

final_choice <- first_choice

If you’d rather see results when we take the wise advice ad switch, you can use this code.

final_choice <- doors[-c(first_choice, revealed_door)]

Putting it Al Together

Let’s run through everything we’ve setup so far.

We’ll declare the variables we just went through. I included the two versions of final_choice, but commented out the first version.

The final piece I’ve added here is a conditional output that says hooray if you’ve won and enjoy your goat otherwise.

winning_door <- sample(doors,1)
first_choice <- 1
if (winning_door == first_choice){
  revealed_door <- sample(x = doors[-c(first_choice)], size = 1)
} else {
  revealed_door <- doors[-c(first_choice, winning_door)]
}
#final_choice <- doors[-c(first_choice, revealed_door)]
final_choice <- first_choice
if(final_choice == winning_door){
  print('hooray')
} else {
  print('enjoy your goat')
}

Simulation Time

This is a solid start, but practically speaking, we aren’t going to run this over and over again logging the outcomes. Technically we could, but we certainly don’t have to.

We just have to make a couple of edits.

The first of which is, let’s nest our code in a for loop so we can run it over and over again in sequence.

for(i in 1:10000){
# add code here
}

So that’s going to re-perform the process 10000 times instantaneously, but do we really want to print the results? What we really want to do is log the results, store them somewhere so we can gather the information we’re creating.

What we’re going to need is a counter.

wins <- 0

This will not nest in the for loop as we don’t want the counter to go back to 0 each time it runs for the next record.

What we will include in the for loop is the consideration at the end of each run, “did we win?” and if so we increase the win counter by 1.

wins <- wins + 1

Here it is all assembled.

wins <- 0

for(i in 1:10000){
  winning_door <- sample(doors,1)
  first_choice <- 1
  if (winning_door == first_choice){
    revealed_door <- sample(x = doors[-c(first_choice)], size = 1)
  } else {
    revealed_door <- doors[-c(first_choice, winning_door)]
  }
  # final_choice <- doors[-c(first_choice, revealed_door)]
  final_choice <- first_choice
  if(final_choice == winning_door){
    wins <- wins + 1
  }
}

Now that we’ve now got a count of wins, it’s time to see what percentage of the time you will win if you refuse to change choices.

print(round(wins / 10000, 2))

So rather than the 50% you may have originally thought, the 34% probability originally allocated to just one of the three options remains.

If we run this again switching our decision, we can see that the probability is 67%.

Conclusion

I hope you found this quick lesson on the Monty Hall game insightful. While we are using these principles to model a game, my hope is that you can identify alternative scenarios where this allows you to model probability in meaningful ways.

Feel free to visit me at @data_lessons on Twitter.

Happy Data Science-ing!

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: