Monday, July 21, 2008

The Birthday Problem

One of the most fun statistical results, is The Birthday problem, and it is a favorite for Intro to Probability courses.

Imagine you're at a party with 23 other guests. If you randomly pull aside one of them, the odds he or she will share your birthday are 1 in 365. But the probability that anyone at the party shares a birthday is far higher: about 1 in 2. The intuition is that the probability someone shares a birthday, is not 1/365 times the number of people present, but rather, the probability two people do not share a birthday (364/365) raised to the power of the number of pairs of people. For example, consider the number of combinations of people which is about (N^2)/2 for a sample of N people. So the probability that no one shares a birthday, with N people, is about (364/365)^[(N^2)/2]=49%ish.

Thus it was fun to read lawyers getting all excited about a new piece of research refuting DNA statistics by noting that some people share 7 o 13 gene loci at about 1 million times the frequency presented, a large error. But the error was the researcher. The odds are for a particular set of 13 loci sharing 9 chromosomes. But state crime lab analyst Kathryn Troyer looked at all the combinations of men in a group of 63,000, giving her about 2 billion pairs, and thought she discovered a massive rejection of the hypothesis. She also appears to have looked at many different sets of 13 loci, and ignored the fact that some prisoners are related.

Steven Myers, a senior DNA analyst at the California Department of Justice, gave a presentation to the Assn. of California Crime Lab Analysts, titled "Don't Panic".

Lawyers. They're so cute when they try to do statistics.

John said...

(N^2)/2*(364/365)

Are you sure you have your math right here?

ed said...

Arguing about this stuff seems especially silly to me in a system that routinely relies on eye-witness testimony as the gold standard of proof. We debate whether the DNA false positive rate is one-in-a-billion or merely one-in-a-million; meanwhile we we send people to prison for years based on witness testimony that is probably more like one-in-twenty, if you're lucky.

Eric Falkenstein said...

It's an approximation. So, so 23 people not sharing a birthday is 23^2/2*(364/365)=48.4%, and so the probability at least 2 people share a birthday is 1-48.4%, or 51.6%. Because of adjustments outside the scope of this post, the actual number is 50.7%--close enough.

Anonymous said...

Eric,
see my math below. why is it wrong??

((23^2) / 2) * (364 / 365) = 263.775342

thanks, Dan

Eric Falkenstein said...

oops. I put n*p, not p^n. tx

Pete S said...

Maybe I missed something, but shouldn't it be:
364!/{(365**23)*(341!)}

Pete S said...

I get 1-.462=.538 as the chance that someone has he same birthday.

But a more relevant questions is ... shouldn't I be doing something else rather than thinking about this?

Eric Falkenstein said...

Though the practical significance may be zero, there is some confusion on this, at least in my mind:
Using the combination approach assumes the pairs are independent, which they are not. This is usually a second order error. for example, Using the simple # of pairs approach, the probability no two people share a birthday is (364/365)^(n*(n-1)/2)
or

n(n-1)/2=n!/(2!(n-2)!)=C(n choose 2)

For 23, this is 0.4995
so prob against, 0.5005

The 100% correct approach uses permutations, where the answer is P(365,23)/365^23 or 0.493, for a prob against of 0.507