Wednesday, 11 May 2011

Reply to Greg

I've been asked a very good question from Greg and if I'm honest, it's something I should have thought about looking at before now but I'm always wary of trying to draw any meaningful conclusions from data that is 100% backfitted. Simply, you will NEVER manage to achieve a return in future that is similar to your results from backfitting a multi-variate model to your data. Simply, the more variables your model has, the more backfitted the data is likely to be and the less reliable this data is likely to be when it comes to trying to project the future.

Anyway, for this reason, I deliberately don't use the data from the very early seasons which I used to build the ratings as it would make my results look too good in all honesty!

Greg has asked me to graph the results from the early seasons along with the most recent seasons to get a long term view. I can't stress enough how misleading the results may be (what happens during backfitting isn't likely to happen when the systems go live blah blah) but caveats aside, I thought it was something worth looking at.

After a wee bit of work, I can now share these graphs.

Before doing this analysis today, my hypothesis would have been that the early seasons produce some sort of exponential returns if I graph a trend curve over the P&L line and then as we move into more backtesting and less backfitting, things level off. Here's the results below:

The top graph shows the P&L as requested by Greg. The second graph shows the same P&L but I have highlighted where the periods of backfitting take place, along with where there is a mixture of backfitting/backtesting and where backfitting only occurs. At the end, we can also see the live results this season.

The final graph is one for my own curiosity only and it shows the trend line from fitting a polynomial trend line to the 6th degree. i.e. Ax^6+Bx^5+Cx^4+Dx^3+Ex^2+Fx+G

As you can clearly see, the data that is backfitted seems to produce a much more exponential curve whereas the backtested data/live data produces something more akin to a linear curve. During my backfitting, my aim was to try to produce 45 degree profitability curves and you can see from my previous post that the backtesting produced what I was after.

As an aside and you'd need to look at my previous post again, the first half of this season in the live environment actually produced returns which were more in line with an exponential trend (i.e. the backfitting rather than the backtesting) but the second half of the season has put paid to anything like a 45 degree P&L this season as it has been loss making.

Just on that loss making point. You can see much bigger drops during the backtesting seasons using my rating algorithms than I experienced during the backtesting amazingly. Something I hadn't noticed before but clearly, my backtesting seemed to work out pretty well as it was a nearly perfect 45 degree P&L line. In reality, looking at these 9 seasons now, there has been at least 2 periods where the systems lost more points that they have lost since Christmas this season. Hence, my systems aren't on a worst ever run.

I hope the above answers your question Greg. It's actually helped me out doing this as I've realised that my systems have had a worse run than they are on at the moment. Hence, I'm probably going over the top about how bad this run is. Yeah, it's the worst in 5 seasons but it's not the worst in 9 seasons...... lol


  1. Thanks for that.Come across the blog last week and it has been fascinating reading so far.Really looking forward to next season and following the bets.


  2. No probs Greg.

    Any other questions mate, just give me a shout.