This weekend, on the Public Radio Show, Radiolab (on an episode called STOCHASTICITY), this question was posed: What are the odds that you will get a "string" (or a "run") of 7 Heads in 100 coin flips.
The show spent a lot of time on the question. A Probability "expert" was consulted. The sound of calculator buttons being pressed was recorded. And the answer given was wrong.
The show JUST AIRED THIS FRIDAY here in Seattle (on KUOW) but apparently it first aired in other places more than 3 months ago. There are a number of people who have posted on the show's website that the answer arrived at on the show (and the method used) seem wrong. Many people have posted other answers, and other methods on the show's site, but these are also wrong. [correction: Radiolab seems to be hosting multiple sites that allow comments to be posted on this episode. Both sites contain multiple wrong answers and incorrect methods for solving this problem. One site that I just found has the correct value for the calculation from "Dr. Drang"]
In brief, the probability of such an event occurring is given by the ninth entry in the 1x9 row vector vP^100. Where v = [1, 0, 0, 0, 0, 0, 0, 0, 0] and P is the 9x9 matrix with p_(1,2)=p_(1,3)=.5, and p_(k,2)=p_(k,k+1)=.5 for k=2...8, and p_(9,9)=1 and all other entries zero.
I'd love to try and teach you the method to solve this problem. I had to teach myself a little bit in order to solve the problem, and I thought the concepts themselves were pretty elegant. My explanation is not so elegant (the videos I made are VERY LOW-TECH). But I do think that if you watch it all the way through, you too will be able to solve the correctly the problem that's being solved wrong on Radio stations all over.
the four short videos are posted below.
The show spent a lot of time on the question. A Probability "expert" was consulted. The sound of calculator buttons being pressed was recorded. And the answer given was wrong.
The show JUST AIRED THIS FRIDAY here in Seattle (on KUOW) but apparently it first aired in other places more than 3 months ago. There are a number of people who have posted on the show's website that the answer arrived at on the show (and the method used) seem wrong. Many people have posted other answers, and other methods on the show's site, but these are also wrong. [correction: Radiolab seems to be hosting multiple sites that allow comments to be posted on this episode. Both sites contain multiple wrong answers and incorrect methods for solving this problem. One site that I just found has the correct value for the calculation from "Dr. Drang"]
In brief, the probability of such an event occurring is given by the ninth entry in the 1x9 row vector vP^100. Where v = [1, 0, 0, 0, 0, 0, 0, 0, 0] and P is the 9x9 matrix with p_(1,2)=p_(1,3)=.5, and p_(k,2)=p_(k,k+1)=.5 for k=2...8, and p_(9,9)=1 and all other entries zero.
I'd love to try and teach you the method to solve this problem. I had to teach myself a little bit in order to solve the problem, and I thought the concepts themselves were pretty elegant. My explanation is not so elegant (the videos I made are VERY LOW-TECH). But I do think that if you watch it all the way through, you too will be able to solve the correctly the problem that's being solved wrong on Radio stations all over.
the four short videos are posted below.
The videos show one method of solving the problem (using an absorbing Markov Chain to arrive at the matrix product listed above). Below the videos I've shown how a recurrence can be used to solve the same problem.
(You should consider trying to solve the problem using whatever method you like before watching them)
(You should consider trying to solve the problem using whatever method you like before watching them)
This first clip serves as an introduction to a series of 3 quick videos which, together, show how to solve the question "What is the probability of getting a "streak" (or "run") of 7 (or more) Heads in 100 coin flips.?"
In this clip, an overview is given of the topics covered in the main three videos.
Whoever you are, this Intro is a good starting point. (If you're solid on basic probability, you may want to skip Part 1 after watching this Intro Video.
In this clip, an overview is given of the topics covered in the main three videos.
Whoever you are, this Intro is a good starting point. (If you're solid on basic probability, you may want to skip Part 1 after watching this Intro Video.
In this clip, some BASIC IDEAS of probability are discussed (Independent Events, Mutually Exclusive Events). I would say if you feel like you're solid on those terms, or, if you'd feel confident about finding the probability of getting exactly 3 heads in 10 flips (a problem that has NOTHING TO DO with our problem and is NOT DISCUSSED in this clip....but is a great "test" to see if you're solid on your basic probability) then you should skip to the "part 2" video.
In this clip a SIMULATION (I call it a "board game") is described. The "game" would allow us to answer the question after just taking 100 "turns" with the game. The SPACES on the board represent the different possible "states" we may be in after a certain number of flips, the number of pieces in each space represents the probability of being in that state at a certain point. The PIECES I use are pennies. They're small and they stack well.....and I thought it'd be cute. But if they confuse you here, you should think of them as JELLY BEANS, or something that CAN'T BE FLIPPED. Maybe imagine POKER CHIPS (that are the same on both sides). Their only function is to track different quantities (different probabilities).
In this final part, using a calculator, and matrix multiplication are discussed.
End of the first Method (the Absorbing Markov Chain method)
--------------
Below is a second method used for solving the same problem, this time using Recurrence.
I'll try to type up this method, but it will take a little bit of notation. You are NOT ALLOWED to scroll down and get scared off by the notation. It's true that the notation is UGLY, but each of the different ideas that each bit of notation refers to is really quite simple.
The original question focused on the probability of GETTING a streak of at least 7 Heads in 100 flips of the coin. For this method, we'll focus on the idea of NOT GETTING a streak of at least 7 Heads in 100 flips of the coin.
The notation N_f will represent the NUMBER of ways that we can flip a coin "f" times and NOT get a streak of 7 heads somewhere in those "f" flips.
(for example, N_20 is the number of ways we can flip a coin 20 times and not get a string of 7 [or more] heads in those 20 flips)
N_f(T) will be the NUMBER of ways that we can flip a coin "f" times and NOT get a streak of 7 heads somewhere in those "f" flips AND get a TAIL for our last flip.
N_f(H) will be the NUMBER of ways that we can flip a coin "f" times and NOT get a streak of 7 heads somewhere in those "f" flips AND get a HEAD for our last flip.
and
N_f(~6H) will be the NUMBER of ways that we can flip a coin "f" times and NOT get a streak of 7 heads somewhere in those "f" flips and NOT have the final 6 flips be Heads.
(we will use this notation with numbers other than 6. For example, N_17(~4H) is the number of ways we can flip a coin 17 times, and NOT get a streak of 7 heads somewhere in those "17" flips and NOT have the final 4 flips be Heads.
This should be all of the notation that we'll need.
Consider now N_3. This is simply the number of ways that we can flip a coin 3 times and NOT get 7 Heads in a row. Well, no matter what happens we can't have a streak of 7 Heads after only 3 flips, and so the answer is 8. Here are the 8 possibilities:
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
Knowing N_3 how could we think about calculating N_4.
You might intuitively see that N_4 will just be N_3 doubled. So we have
N_4 = 2 times N_3 = N_3 + N_3
It will be useful for our method to instead notice this:
N_4 = N_4(T) + N_4(H)
and then see that N_4(T) is just N_3
(remember, N_4(T) and N_3 are just NUMBERS, for both expressions we are just COUNTING UP the number of ways something can happen. The THINGS were counting I'll write below:
N_3 N_4(T)
HHH HHHT
HHT HHTT
HTH HTHT
HTT HTTT
THH THHT
THT THTT
TTH TTHT
TTT TTTT
notice that there are 8 items in each list. And so N_4(T) = N_3
Similarly, N_4(H) = N_3.
But what happens when we've gone past 7 flips? Again, intuitively we know that this pattern will not hold. Let's look at why.
consider N_20 (or, the number of ways that we can flip a coin 20 times WITHOUT getting a streak of 7 Heads somewhere in those 20 flips)
[Note, there's nothing special about the number 20, everything we're about to do, we could do with any integer larger than 7 in place of 20. This is important as we will use this reasoning to build general formulas later on]
as with N_4 we have that N_20 is just N_20(T) + N_20(H)
(this isn't saying much, it's simply that "all of the ways we could flip a coin 20 times and NOT get a streak of 7 (or more) heads somewhere in those 20 flips, is equal to all such ways that END IN TAILS, and all such ways that END IN HEADS")
The difference is that while N_20(T) is still equal to N_19
(we can take any string of 19 that has no streak of 7 heads, and slap a T on the end to get a string of 20 with no streak of 7 heads)
N_20(H) is NOT equal to N_19
The problem comes from the fact that we might be HITTING our streak on the 20th flip if we add a H onto the end of a string of 19 that ended with 6 Heads.
This then makes clear why we needed that AWFUL notation above.
N_20(H) = N_19(~6H)
and thus N_20 = N_20(T) + N_19(~6H) = N_19 + N_19(~6H)
[it's should be noted that there are "items" being "counted" in both N_19 and N_19(~6H). This "Doubling" is intentional, just as above we had N_4 = N_3 + N_3. The difference now is that we aren't fully "Doubling" but instead "throwing out" some of the items.]
Recall that there's nothing MAGICAL about the number 20 and so we'll write this:
N_20 = N_19 + N_19(~6H)
as this:
N_f = N_[f-1] + N_[f-1](~6H)
for any integer f greater than 7
Call this equation #1
(how much would you love it if there were SUBSCRIPTS on this Blogger)
We need just one more formula like the one we just made above, and then we'll be as good as done.
Consider now:
N_19(~6H)
I want to break up the "items" we're thinking about into two pieces again.
one piece will be
N_19(T)
certainly any string of 19 flips that DOESN'T contain a streak of 7 heads AND ends in tails will also NOT END IN 6 Heads.
The other piece here will be all of the strings of 19 flips that don't contain a streak of 7 heads, AND ends in a HEAD, but does NOT end with 6 heads. (ugh)
....well to find the "items" we are thinking of, we could consider all of the strings of length 18 and just add one head to them....the only catch of course, is that we'd want to THROW OUT, and string of 18 that ends in 5 Heads. Our notation for that is just:
N_18(~5H)
and so we have
N_19(~6H) = N_19(T) + N_18(~5H)
remember that N_19(T) is just N_18 so we have:
N_19(~6H) = N_18+ N_18(~5H)
as there's nothing special about 19 or 18 we have that
N_{f}(~6H) = N_{f-1} + N_{f-1}(~5H)
for any integer f greater than 7
and as there's nothing special about 5 or 6 we have:
N_{f}(~kH) = N_{f}(T) + N_{f-1}(~(k-1)H)
Call this equation #2
Using equation 1 and equation 2 alternately, and repeatedly, we have
N_20 = N_19 + N_19(~6H) [by equation #1]
= N_19 + N_18 + N_18(~5H) [by equation #2]
= N_19 + N_18 + N_17 + N_17(~4H) [by #2]
= N_19 + N_18 + N_17 + N_16 + N_16(~3H) [by #2]
= N_19 + N_18 + N_17 + N_16 + N_15 + N_15(~2H) [by #2]
= N_19 + N_18 + N_17 + N_16 + N_15 + N_14 + N_14(~1H) [by #2]
This final term: N_14(~1H) we recognize as just N_14(T) which we know to be equal to N_13
and so
N_20 = N_19 + N_18 + N_17 + N_16 + N_15 + N_14 + N_13
Again, there's nothing special about 20, and so we can write:
N_f = N_{f-1} + N_{f-2} + N_{f-3}+ N_{f-4} + N_{f-5} + N_{f-6} + N_{f-7}
for any integer f bigger than 7.
From inspection we know that
N_1 = 2
N_2 = 4
N_3 = 8
N_4 = 16
N_5 = 32
N_6 = 64
N_7 is the only one we should slow down for.
We know that there are 128 different results from flipping 1 coin 7 times, but ONE of these (HHHHHHH) will give us a streak of 7 heads, and so
N_7 = 127
Putting these first 7 "seed" values into 7 fields in excel (each one below the previous one), and then asking excel to just make each field after the 7th be the sum of the seven fields above it, will allow you to quickly calculate out N_100.
This number divided by 2^100 will give the probability of NOT getting a streak of 7 Heads in 100 coin flips. Taking 1 minus this number will yield the desired result: approximately 31.752% which is the same number found using the Absorbing Markov Chain method.
--------------
Below is a second method used for solving the same problem, this time using Recurrence.
I'll try to type up this method, but it will take a little bit of notation. You are NOT ALLOWED to scroll down and get scared off by the notation. It's true that the notation is UGLY, but each of the different ideas that each bit of notation refers to is really quite simple.
The original question focused on the probability of GETTING a streak of at least 7 Heads in 100 flips of the coin. For this method, we'll focus on the idea of NOT GETTING a streak of at least 7 Heads in 100 flips of the coin.
The notation N_f will represent the NUMBER of ways that we can flip a coin "f" times and NOT get a streak of 7 heads somewhere in those "f" flips.
(for example, N_20 is the number of ways we can flip a coin 20 times and not get a string of 7 [or more] heads in those 20 flips)
N_f(T) will be the NUMBER of ways that we can flip a coin "f" times and NOT get a streak of 7 heads somewhere in those "f" flips AND get a TAIL for our last flip.
N_f(H) will be the NUMBER of ways that we can flip a coin "f" times and NOT get a streak of 7 heads somewhere in those "f" flips AND get a HEAD for our last flip.
and
N_f(~6H) will be the NUMBER of ways that we can flip a coin "f" times and NOT get a streak of 7 heads somewhere in those "f" flips and NOT have the final 6 flips be Heads.
(we will use this notation with numbers other than 6. For example, N_17(~4H) is the number of ways we can flip a coin 17 times, and NOT get a streak of 7 heads somewhere in those "17" flips and NOT have the final 4 flips be Heads.
This should be all of the notation that we'll need.
Consider now N_3. This is simply the number of ways that we can flip a coin 3 times and NOT get 7 Heads in a row. Well, no matter what happens we can't have a streak of 7 Heads after only 3 flips, and so the answer is 8. Here are the 8 possibilities:
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
Knowing N_3 how could we think about calculating N_4.
You might intuitively see that N_4 will just be N_3 doubled. So we have
N_4 = 2 times N_3 = N_3 + N_3
It will be useful for our method to instead notice this:
N_4 = N_4(T) + N_4(H)
and then see that N_4(T) is just N_3
(remember, N_4(T) and N_3 are just NUMBERS, for both expressions we are just COUNTING UP the number of ways something can happen. The THINGS were counting I'll write below:
N_3 N_4(T)
HHH HHHT
HHT HHTT
HTH HTHT
HTT HTTT
THH THHT
THT THTT
TTH TTHT
TTT TTTT
notice that there are 8 items in each list. And so N_4(T) = N_3
Similarly, N_4(H) = N_3.
But what happens when we've gone past 7 flips? Again, intuitively we know that this pattern will not hold. Let's look at why.
consider N_20 (or, the number of ways that we can flip a coin 20 times WITHOUT getting a streak of 7 Heads somewhere in those 20 flips)
[Note, there's nothing special about the number 20, everything we're about to do, we could do with any integer larger than 7 in place of 20. This is important as we will use this reasoning to build general formulas later on]
as with N_4 we have that N_20 is just N_20(T) + N_20(H)
(this isn't saying much, it's simply that "all of the ways we could flip a coin 20 times and NOT get a streak of 7 (or more) heads somewhere in those 20 flips, is equal to all such ways that END IN TAILS, and all such ways that END IN HEADS")
The difference is that while N_20(T) is still equal to N_19
(we can take any string of 19 that has no streak of 7 heads, and slap a T on the end to get a string of 20 with no streak of 7 heads)
N_20(H) is NOT equal to N_19
The problem comes from the fact that we might be HITTING our streak on the 20th flip if we add a H onto the end of a string of 19 that ended with 6 Heads.
This then makes clear why we needed that AWFUL notation above.
N_20(H) = N_19(~6H)
and thus N_20 = N_20(T) + N_19(~6H) = N_19 + N_19(~6H)
[it's should be noted that there are "items" being "counted" in both N_19 and N_19(~6H). This "Doubling" is intentional, just as above we had N_4 = N_3 + N_3. The difference now is that we aren't fully "Doubling" but instead "throwing out" some of the items.]
Recall that there's nothing MAGICAL about the number 20 and so we'll write this:
N_20 = N_19 + N_19(~6H)
as this:
N_f = N_[f-1] + N_[f-1](~6H)
for any integer f greater than 7
Call this equation #1
(how much would you love it if there were SUBSCRIPTS on this Blogger)
We need just one more formula like the one we just made above, and then we'll be as good as done.
Consider now:
N_19(~6H)
I want to break up the "items" we're thinking about into two pieces again.
one piece will be
N_19(T)
certainly any string of 19 flips that DOESN'T contain a streak of 7 heads AND ends in tails will also NOT END IN 6 Heads.
The other piece here will be all of the strings of 19 flips that don't contain a streak of 7 heads, AND ends in a HEAD, but does NOT end with 6 heads. (ugh)
....well to find the "items" we are thinking of, we could consider all of the strings of length 18 and just add one head to them....the only catch of course, is that we'd want to THROW OUT, and string of 18 that ends in 5 Heads. Our notation for that is just:
N_18(~5H)
and so we have
N_19(~6H) = N_19(T) + N_18(~5H)
remember that N_19(T) is just N_18 so we have:
N_19(~6H) = N_18+ N_18(~5H)
as there's nothing special about 19 or 18 we have that
N_{f}(~6H) = N_{f-1} + N_{f-1}(~5H)
for any integer f greater than 7
and as there's nothing special about 5 or 6 we have:
N_{f}(~kH) = N_{f}(T) + N_{f-1}(~(k-1)H)
Call this equation #2
Using equation 1 and equation 2 alternately, and repeatedly, we have
N_20 = N_19 + N_19(~6H) [by equation #1]
= N_19 + N_18 + N_18(~5H) [by equation #2]
= N_19 + N_18 + N_17 + N_17(~4H) [by #2]
= N_19 + N_18 + N_17 + N_16 + N_16(~3H) [by #2]
= N_19 + N_18 + N_17 + N_16 + N_15 + N_15(~2H) [by #2]
= N_19 + N_18 + N_17 + N_16 + N_15 + N_14 + N_14(~1H) [by #2]
This final term: N_14(~1H) we recognize as just N_14(T) which we know to be equal to N_13
and so
N_20 = N_19 + N_18 + N_17 + N_16 + N_15 + N_14 + N_13
Again, there's nothing special about 20, and so we can write:
N_f = N_{f-1} + N_{f-2} + N_{f-3}+ N_{f-4} + N_{f-5} + N_{f-6} + N_{f-7}
for any integer f bigger than 7.
From inspection we know that
N_1 = 2
N_2 = 4
N_3 = 8
N_4 = 16
N_5 = 32
N_6 = 64
N_7 is the only one we should slow down for.
We know that there are 128 different results from flipping 1 coin 7 times, but ONE of these (HHHHHHH) will give us a streak of 7 heads, and so
N_7 = 127
Putting these first 7 "seed" values into 7 fields in excel (each one below the previous one), and then asking excel to just make each field after the 7th be the sum of the seven fields above it, will allow you to quickly calculate out N_100.
This number divided by 2^100 will give the probability of NOT getting a streak of 7 Heads in 100 coin flips. Taking 1 minus this number will yield the desired result: approximately 31.752% which is the same number found using the Absorbing Markov Chain method.
I love recursion. It always seems like magic.
ReplyDelete--Jim