Falkenblog: Backtesting Errors

Wednesday, January 28, 2009

Backtesting Errors

The most annoying part of backtesting is creating a database, which is problematic because most data is meant to show a one-off take of current information, as opposed to cross-sectional data as of January 14, 1998. It's free and easy to see all about, say, IBM's market and financial statement data. But what good is that information if not put into context, and what is the context, other than a historical sense for the relative distributions and correlations, of the past?

Pulling together such data usually involves integrating data from different sources, splicing them together, and making sure you have 'dead' companies. Invariably a data provider insists they have 'all' the data, because for them, current companies are all they can conceive of. So you have to be specific, and ask instead if they have 'dead' companies, such as Enron, WorldCom-MCI, and Bear Stearns. Then there are issues about splits, dividends, that can, if not accounted for, generate illusory patterns.

Anyway, I discovered something I thought was really interesting, but then found it was merely an error. See if you can spot the error. I have a database with financial statement information available at the end of every month. With that record or observation, I have the month-ahead returns, and also information on the trading volume and market cap as of the end of the month.

I looked at earnings day returns, the returns from the day prior to the earnings release date, and the close of prices the day after the earnings release date. To do this, I had to take the earnings report date information, and then take those date-firmID pairs to a database of daily return data. I noted their daily (annualized) volatility is about 150% higher than average daily returns in this period, which makes total sense. Just for fun, I looked at all companies with a market cap greater than $1B, and saw that the average daily return on this date was about 0.2% or so, highly significant. On average the return was significantly positive, but no one noticed because for any one observation there is a lot of noise, and its not so large as to be totally obvious. I looked for companies with greater than $500MM market cap, same result, and was highly consistent over time.

What is the error?

Well, my size filter used market cap data from the end of the month, to look back at the earnings date returns from that same month. By looking only at companies with greater than $1B or $500MM in market cap at the end of the month, it would include enough of those who migrated upward, and exclude enough of those that migrated downward, to generate the 0.2% return, which was entirely due to this bias. That is, there are enough companies moving from $975MM to $1001MM on earnings that get in, and enough going from $1001MM to $975 that are censored, to generate an illusory sample statistic.

I didn't think a $1B cut-off was material, especially because usually this database is used to look for patterns in month-ahead returns where the bias does not exist, so it didn't occur to me. But in fact there was a material selection bias in the market cap cut-off. These things are subtle sometimes.

6 comments:

Anonymous said...: In my testing I usually optimize user introduced parameters over a range and create a parametrized sharpe surface. It allows me to quickly and visually inspect the local maxima area for various parameters to see if the sharpe is robust across parameters and hence greater confidence with the output. A small sharpe surface would alert me to some sort of bias or curve-fitting.; 11:33 AM
Eric Falkenstein said...: That's good test. But in this case, no matter the parameter cutoff, you would see the pattern, unless you simply used all data, but that includes so many non-tradeable stocks you could then say the result is robust where it matters.; 12:37 PM
Anonymous said...: Who found the error? Did you catch it yourself?

Why put an upward bound on the range at all? Why not just a lower bound, because those are inefficient? Why not make the criteria related to the float that is not owned by management/insiders?; 2:56 PM
Eric Falkenstein said...: It was a lower bound (ie, all companies >$1B). I didn't have volume, clearly that would have avoided the problem. I found the error.; 3:40 PM
Anonymous said...: what a lightweight! no soup for you!
now how about my inflation query?; 6:16 PM
Anonymous said...: Another major issue is that you don't always know the earnings announcement date in advance. Sometimes earnings are expected to be on a certain future date, but then they are delayed [e.g. the CPAs refused to sign or something]. Is that issue taken into account in your study?; 5:40 PM