Monday, July 21, 2008

The Birthday Problem

One of the most fun statistical results, is The Birthday problem, and it is a favorite for Intro to Probability courses.

Imagine you're at a party with 23 other guests. If you randomly pull aside one of them, the odds he or she will share your birthday are 1 in 365. But the probability that anyone at the party shares a birthday is far higher: about 1 in 2. The intuition is that the probability someone shares a birthday, is not 1/365 times the number of people present, but rather, the probability two people do not share a birthday (364/365) raised to the power of the number of pairs of people. For example, consider the number of combinations of people which is about (N^2)/2 for a sample of N people. So the probability that no one shares a birthday, with N people, is about (364/365)^[(N^2)/2]=49%ish.

Thus it was fun to read lawyers getting all excited about a new piece of research refuting DNA statistics by noting that some people share 7 o 13 gene loci at about 1 million times the frequency presented, a large error. But the error was the researcher. The odds are for a particular set of 13 loci sharing 9 chromosomes. But state crime lab analyst Kathryn Troyer looked at all the combinations of men in a group of 63,000, giving her about 2 billion pairs, and thought she discovered a massive rejection of the hypothesis. She also appears to have looked at many different sets of 13 loci, and ignored the fact that some prisoners are related.

Steven Myers, a senior DNA analyst at the California Department of Justice, gave a presentation to the Assn. of California Crime Lab Analysts, titled "Don't Panic".

Lawyers. They're so cute when they try to do statistics.


John said...


Are you sure you have your math right here?

ed said...

Arguing about this stuff seems especially silly to me in a system that routinely relies on eye-witness testimony as the gold standard of proof. We debate whether the DNA false positive rate is one-in-a-billion or merely one-in-a-million; meanwhile we we send people to prison for years based on witness testimony that is probably more like one-in-twenty, if you're lucky.

Eric Falkenstein said...

It's an approximation. So, so 23 people not sharing a birthday is 23^2/2*(364/365)=48.4%, and so the probability at least 2 people share a birthday is 1-48.4%, or 51.6%. Because of adjustments outside the scope of this post, the actual number is 50.7%--close enough.

Anonymous said...

see my math below. why is it wrong??

((23^2) / 2) * (364 / 365) = 263.775342

thanks, Dan

Eric Falkenstein said...

oops. I put n*p, not p^n. tx

Pete Sattler said...

Maybe I missed something, but shouldn't it be:

Pete Sattler said...

I get 1-.462=.538 as the chance that someone has he same birthday.

But a more relevant questions is ... shouldn't I be doing something else rather than thinking about this?

Eric Falkenstein said...

Though the practical significance may be zero, there is some confusion on this, at least in my mind:
Using the combination approach assumes the pairs are independent, which they are not. This is usually a second order error. for example, Using the simple # of pairs approach, the probability no two people share a birthday is (364/365)^(n*(n-1)/2)

n(n-1)/2=n!/(2!(n-2)!)=C(n choose 2)

For 23, this is 0.4995
so prob against, 0.5005

The 100% correct approach uses permutations, where the answer is P(365,23)/365^23 or 0.493, for a prob against of 0.507