OK, this post is about two weeks overdue I suspect (maybe longer as I can’t remember!) but here’s a quick summary of the rating algorithms and the systems.
I started out in January 2010 with the idea of developing my own football ratings. After playing about with data sources and different leagues, I had something built at the end of March that was simply a prototype. I tested it on a few leagues at the end of that season and the results looked promising.
The Summer arrived and I built my first full algorithm. Before I had finished backtesting the first algorithm, I’d already had the idea of a second algorithm based on what I learnt from my first algorithm as once you know how to build something like this, it is fairly easy to replicate it.
I often switch between talking about algorithms and systems on the blog and I know people don’t always get what I’m talking about. I’ll keep this very brief and make it seem easy!
In simple terms, an algorithm is simply a sophisticated model that is used to produce football ratings for each game. Once you have the ratings for each game, you can test the ratings against the odds available for each outcome in the game and either using software or Excel, try to recalibrate the ratings so that they produce a profit during backtesting.
My algorithms are pretty basic in the sense they are multivariate linear models. Hence, they are simply along the lines of:
Factor A * Weighting A + Factor B * Weighting B and so on. In total, each algorithm has 28 factors and 28 different weightings. I initially started off with 7 factors which became 14 which became 28. In a way, the factors aren’t the thing that separates my ratings from others I suspect. The weights on each of the factor is the thing that is my USP.
How did I determine the optimal weights? Well, I used 6 years of historical data and using the help of a colleague at work who’s into PC SAS, built a stochastic model to determine the optimal weights to backfit a model that produced profits based on certain criteria.
I can tell you the best indicators when it comes to building a football ratings model and I can also tell you the relativity of one rating factor to another. Hence, rating factor A is 2.45 times more important than factor B.
In terms of the factors, it is fairly straightforward with Shots on goal and Shots on target being the key underlying variables in the model. Hence, whereas people look at results of games to determine value, I look at shots on goal and try to find teams who are maybe playing better than the league table indicates. Hence, you look for teams who are playing better than their results indicate as these are the ones the odds compilers and general public will often underestimate.
Being honest, all of the above is pretty basic stuff for anyone who knows what they are doing.
I think my other USP at this game was finding an algorithm that worked and then building a second one straight after! The power of two algorithms only became apparent after last season started admittedly but hopefully if you follow the blog, you’ll see why having two algorithms is better than one!
Once the algorithm was built, it was then a simple game of backfitting systems on the historical ratings data to come up with systems that can be used going forward.
I made the decision early on to never use the most recent years in my backfitting as I needed a large sample of data to backtest my systems on. Hence, I was therefore confident that if they systems worked when I backtested them over the most recent seasons, then they are likely to work when they went live.
Last season, I started proofing bets from my first two algorithms. Instead of just proofing the algorithm bets (every bet), I came up with this unique way of looking at things. This is another USP!
I have 10 systems for each algorithm. What I do is pick out all the bets for the 10 systems and then depending on how many times each team appears, they make their way onto a system. The 10 systems are split 5 home systems and 5 away systems.
So, I have systems A,B,C,D,E,F,G,H,I and J for each algorithm.
If system A,B,C pick the same team, this means the team makes it onto system 8. If systems A and E pick a team, that means it appears on system 7. If only system B picks a team, that makes its way onto system 6. Likewise for systems 21 and 22.
After watching the results last season of systems 6,7,8,21,22, someone mentioned to me that they were following the bets that were picked out by system 8 and 22 and the returns were exceptional. I hadn’t thought of this, so I started a new project of looking at the returns by backing teams that appear on two systems.
Quite quickly, I picked up on the fact the results were out of this world and therefore, I started proofing the bets on the combined systems. Of course, the combined system bets were already proofed from the start of the season as if a bet appeared on 6 and 21, then it automatically becomes a bet on system 6-21 even though I didn’t know at the time!
Anyway, that explains how I got the first two rating algorithms and systems 6,7,8,21,22,6-21,6-22,7-21,7-22,8-21 and 8-22.
Before I go on, an obvious question, what happened to systems 1-5, 11-20,26-30. In addition, what happened to systems 9,10,23,24 and 25?
Well, systems 1-5,11-15 and 26-30 are just other rating algorithms that lie somewhere on the cutting floor. They didn’t make the grade before going live. The reason for not changing the names of any of the systems that are live is that people who I was sharing all of the model building information with were used to the names of all the systems and therefore, it felt wrong to change them. In addition, I was also used to the names of the systems and became quite fond of system 8 etc. so I didn’t want to change the names for anything fancier.
Systems 16-20 were multiple bets systems based on building multiple bets from the single bet systems. They were proofed on the blog last year but were very high risk, high return and not suited to portfolio betting. They provided me with my highest winning day on betting ever last season but also accounted for a steady stream of losses, so they were dropped from the portfolio of systems this season. The results remain on the blog.
Systems 9,10,23,24 and 25 were all live last season but were dropped for this season. They were all dropped for the same reason of not having enough turnover.
As for the results last season, they are all on the blog and proofed independently by the Secret Betting Club.
Where did systems 31,32,33 come from? Well, using the same approach to the first two algorithms, I built a new algorithm in the Summer of 2011 to use for this season. This is the 3rd algorithm.
What are systems TOX, STOY and STOZ? Well, after building the 3rd algorithm in the Summer, I had a couple of weeks left before the season started. I started reading about someone who had built a Similar Games Model for Basketball games in the US and I decided I’d like the challenge of trying to do the same for the footie this season. Obviously, to ensure I don’t do anything mad, I’d cross refer the SGM with my ratings algorithms so every team that appears on these systems appears on one of my algorithms.
The names don’t mean anything to anyone apart from me and again, due to the fact I’m used to the names, I’ll keep them as they are.
During backtesting, the SGM systems produced the best results I’ve ever seen from a football system, so I had very high hopes for this season for these bets. So far, they’ve been rubbish and to outsiders, may look like I’m wasting my time doing an hour’s extra work every week to find and track these bets but I don’t tend to be wrong too often about systems. Long-term, I’ll be surprised if the returns from these SGM systems don’t beat all my other systems. However, at the moment, they are trailing way behind and are actually loss making as of today this season. Quite an achievement considering all 3 rating algorithms are in profit! Anyway, you’ve all heard of the Tortoise and the Hare, so don’t write these systems off yet.
Amazingly, I’ve managed to write a summary of the systems in exactly 1500 words. Just!
Here’s a diagram which explains it all with no words. :)
(and that readers, is 1,500 words)