In my never-ending litigation from the Dark Side, the judge at one point said: "I was reading the New Yorker, and Harry Kat seems like an expert on Hedge Funds. Why can't we ask him about this?" Not good.
But anyway, it seems overfitting is a common vice in that it is so tempting, in that something that fits a set of data 'could' be the model driving it. This error is especially common for those with natural science backgrounds, because small samples aren't really a problem to physicists and chemists ,and so they don't develop any intuition for these issues. Thus, I got this copyrighted white paper, so I can't post it, but it's free, and I got it at Albourne Village. Its Fund of Hedge Fund Portfolio Risk, by Investor Analytics. It tries, like Kat, to reverse engineer a fund via throwing a bunch of time series against it. The paper notes, "most FoHF managers use a minimum of 24 data points (2 years of monthly returns), but 36 data points is generally preferred". Hey, after 30 datapoints, a Student's T distribution approximates a Gaussian distribution, so... Anyway, they give an example with 29 datapoints. They find a 0.8 R2 using a combination of aluminum, precious metal, natural gas, and gas oil futures, and the square of crude oil. They note that they used the square of crude oil because the initial set of factors included cocoa, and they portfolio managers don't trade cocoa. OK. so you have 3 highly correlated inputs (gaus, oil, oil^2) and then a precious metal and an industrial one. I wonder what random set of commodity futures best explain the S&P over the past 29 months? I'm sure there is a set.
I thought, who writes this stuff? And I go to Investor Analytics website, and the CEO is described as follows:
Damian received his undergraduate degree from the University of Pennsylvania and his doctorate in nuclear physics while working at the National Superconducting Cyclotron Laboratory in Michigan.
2 Cheers for economics. The experience with time series, and the limitations of Keynesian macro models, leading economic indicators, Kalman filters, Vector Autoregressions, prevents us from wasting time this way. Or at least it should.