Wednesday, May 26, 2010
Unintentionally Harmful Econometrics
Labor economists joshua Angrist and Jorn-Steffen Pischke have a neat little book on econometrics: Mostly Harmless Econometrics. Alas, the link to anything written by Douglas Adams is a bit strained, but I appreciate their light-hearted approach. Basically, they outline the fundamental problem that vexes econometric research: the omitted variables bias.
Say you want to estimate how schooling affects earnings. People who go to school longer have higher earnings. But they also have greater discipline, come from better socio-economic backgrounds, and have higher IQ, all of which might be the true cause. If you don't include these in the regression, you attribute too much benefit to schooling via these omitted variables (that are often difficult--or in the case of IQ, taboo--to measure). So the book goes over all sorts of ways to tease out the true relationship. Foremost in this approach is the Freakonomics method of natural experiments. This is when some arbitrary event creates different samples where the variable of interest (schooling) is different, but the other factors (IQ, wealth, discipline) are the same.
Basically, you compare a group affected by some arbitrary process that stopped group B from getting more schooling that is otherwise indistinguishable from group A that got more schooling. The signature example in the book is the Angrist and Krueger (1991) paper, which takes advantage of the fact that many states required kids to go to school until the age of 16, and they had to start school in September at age 6. Because school years end in June, this means kids born just before school starts would be 16 just before another year started, while those born just after would be only 15, and have to at least start another year. This causes kids to get different amounts of education merely because their birth date,, which is independent of IQ, discipline, and socio-economic status. A 'natural experiment'.
The precise methods of finding 'identifying restrictions' is not so important. It's the mainstay of rigorous research, but ultimately, if your result shows up only via coefficient t-stats in 2-stage least squares, but not a graph, it isn't there. The earnings are what they are, what's the difference for the kids who went to school an extra year? Here, we see the problem with econometrics. The authors are very excited by the saw-tooth pattern that shows a significant birthday bump around December 31 for the Angrist and Krueger study, highlighting that staying in school an extra year provided a statistically significant bump in earnings.
But the effect is small, about 3% effect on wages via the extra year for those kids born too late to skip their last grade. Sure, with enough observations it is significant, but not enough to change the world. It became an inspiration for a generation of Harvard economists like Levitt because the approach seemed to solve a problem using both cleverness, and high-brow econometric techniques. Yet, it does not follow that staying in school an extra year at 16 implies going to college would also help. Very few effects are linear. I could take golf lessons and lower my score by 10 strokes in a few lessons; it would not, over a year, put me on the PGA tour. The authors seem oblivious to this point.
One could say, he has a long section on nonlinearity, and indeed he does. But the inordinate amount of attention he pays to his little 3% wage increase belies this knowledge. The net-net inference is clearly that the 3% wage increase has policy implications, otherwise he, and the labor economics community, would not cite it so much. In the end, he doesn't think nonlinearities are relevant to his
One could imagine them testifying before Congress that their research proves we should spend more money on college, but that presumes a lot of things. It assumes a linear extrapolation of their 16-year old findings. It assumes spending more money on college aid actually increases schooling, as opposed to what is charged by schools. It assumes people will study subjects that are actually skill related, as opposed to becoming film-studies majors who don't learn anything useful.
Where all this excitement with econometric technique, and natural experiments, over common sense leads is best examplified by this paper on the the impact of file-sharing on record sales by Oberholzer-Gee and Strumpf, published as the lead article in the Feb 2007 Journal of Political Economics (Levitt's journal) to great fanfare. Their measured relationship between the instrument (German students on vacation) and the variable that it is instrumenting for, American downloading, is seen to have a large effect, presumably because Germans download and share a lot music, and spend a lot of time sharing files when on vacation.
Anyway, Most of Oberholzer-Gee and Strumpf's work emphasizes econometrics, not the stuff Stan Liebowitz brings up, such as the proportion of school kids who actually download music as a percent of music downloads in Germany and their percent of US music sales. Vacation time correlates with traditional US seasonal patterns, as music sales spike before Christmas when German Junge are in school for reason unrelated to file sharing (stockings need CDs). Basically, the empirical issue is one of institutional detail, know about seasonality of music sales, school schedules, time of music downloads, and very little to do with identifying restrictions that are emphasized in their academic piece.
In a sense, econometrics as a science allows shoddy empirical work to hide behind pretentious techniques that try to avoid these issues. Vector auto-regressions, or natural experiments, do not obviate the need for common sense, and understanding of the subject to which the tool is being applied. The manifest failure to predict stock returns, business cycles, interest rates that became so apparent in the 1980s caused econometricians to search for small subjects where natural experiments exist, and then draw some grand extrapolation. People think about the question less than the method, and then assume that because the results are so clean, it will be profound. I think many econometricians wish that it were so, so they could focus on what they really like, math, while still saying something interesting about important issues like schooling.
If the only way to tease out, say, the risk premium, is to use a method-of-moments estimation with 3 identifying restrictions based on some fundamental utility function, but the bottom line is you can't say whether Coke is a riskier stock than GM, you don't have a result. It's academic.