Friday, 8 June 2012

European Rating Algorithm One (System E1)

I’m not going to go into depth about how I build the ratings as quite simply, I think this is the thing that separates me from every other analyst in the world who’s trying to come up with a way to build profitable football ratings. I’ve given away bits and pieces of info over the past couple of years, along with a lot of hints about how I do it and what factors I look at but that’s about as far as I’ll go. 

What I would say is that the factors in the two European algorithms are IDENTICAL to the factors in the UK algorithms. When I first told a few people I was going to be looking at the European leagues, they said I’d no doubt find that other factors were more important in these leagues than in the UK leagues. Nope, I didn’t and I did test for other factors but basically, every factor that appears in the UK algorithm appears in the European algorithm.

In a way, this is quite reassuring for me as it means the work I did nearly 2.5 years ago is still valid! When I was looking at factors you can use to build football ratings, I downloaded data from lots of different leagues, dumped it in a spreadsheet and started to analyse it. I didn’t care what league it was from, what time of year it was from and so on. My findings then were that shots on goal and shots on target were the best underlying indicator and that remains the case with these new European leagues I’m looking at.

One big difference I can see between the UK algorithms and the European algorithms is the fact that historical results plays no impact in the UK algorithms (it’s a factor in the model but with a weighting of 0 in every algorithm) and yet, in the European algorithm, I can see that there appears to be a correlation between historical results and future results in a fixture.  Hence, this factor carries some weighting in the European rating algorithms.

Without giving away the weightings and variables, I’d say short-form matters more in the UK than in Europe whereas long-term form matters more in Europe.  That’s a definite trend I picked up.  I’d also say home form matters much more in Europe than in UK Leagues. 

The biggest difference by far between the UK and European Leagues is the performance of Home and Aways bets. Basically, it’s very easy (almost hard to lose!) to build a rating algorithm to produce a fantastic return on Homes but unfortunately, the better the algorithm is on Homes, the worse it is on Aways!

It took me a fair bit of time to understand this dynamic as when I first asked the program I have (basically a SAS program which takes in the data and runs simulations) to maximise the return on some variables in the algorithm, it was throwing up massive profits on Homes and massive losses on Away selections for some variables! Hence, it was actually optimal to ignore the Aways and concentrate on the Homes.

I did consider having a separate algorithm for Homes and Aways but after discussing it with a few other football analysts, I decided this was far too big a risk.  For a start, it would have meant that I could have ended up with opposing teams in some games which would have given me a headache and secondly, it is so different to what I did with the UK algorithms, it means I would have been going into the unknown. 

Therefore, I reigned back the Home selections a little and found weightings in the model which ensured I could achieve a decent profit on the Away bets too.  Sacrificing a little return on Homes to get a better return overall isn’t a bad thing and I did a similar thing with the UK systems back at the beginning when I think back but the opposite way around! All the profits were on the aways in the UK leagues, so I pulled these back a bit and ended up finding some really good home bets.

One other change between the original backtesting for the UK systems and the Euro systems is the overround.  In the UK systems, I adjusted the draw odds to try to account for the fact that the draw odds I was using are far too low and the overround was too high. If you use a bookie like Pinnacle for draw odds, you can easily beat the draw odds I’m using in the backtesting results.  There is NO adjustment in these results though.

What does this mean? Well, it means the historical results for AH results are probably lower than what can be achieved in a live environment. In hindsight, I’d always wished I’d never adjusted the draw odds in the original backtesting for the UK systems as it made the AH results slightly too good during backtesting.  So, in this case then, the AH results are going to look too low. So, you can probably see these AH returns as a base minimum for all the Euro systems.  Something to be wary of when looking at the results.

So, after a few late nights, the first algorithm is complete. How does it look?

Well, similar to what I’ve done with the UK algorithms, I won’t ever publish the results from seasons 2000-2005. Quite simply, it doesn’t add anything to the analysis as the data is 100% backfitted and therefore, it is useless data as a means of projecting the future.

One small change compared to the UK systems is the fact that I am now 7 years away from the last season of fully backfitted data.  I’ve made a conscious decision to change the part seasons I’ve used during the backfitting process.

I’ve used 50% of the data from 2006/07 and 2007/08 (which is identical to what I did with the UK algorithms) but I’ve also used 50% of the data from 2010/11.  I feel like if I don’t use any more recent data, the data in the model would be too far out of date and therefore, I’m happy to sacrifice a season’s results to ensure I could get a model that works going forward.

What it means is that when you look at results from 2010/11 (as well as the first two seasons), be very wary of them.  They are NOT backfitted results so to speak, 50% of the games in this season was used in the backfitting process.  Hence, they are not 100% reliable but then again, they are not 100% backfitted either.

Anyway, caveats aside, here’s the results of the first European algorithm.

Similar to what I did when I first showed the UK algorithms to people, here’s the results from the 3 seasons which are fully backtested.  This should give an indication of what can be achieved going forward (ignoring the fact I now know the single systems won’t achieve these results as they didn’t with the UK bets).

The backtested results look about 75% of the overall results, so there isn’t a big discrepancy.  It was bigger on the UK results!  Hence, it’s OK to look at the full set of results and not get drawn into worrying about the results being massively overtstated.

Here’s the split by Homes and Away.

The most interesting aspect of Euro algorithm one is the fact that over 60% of the bets are Home bets!  I can’t explain how different this is to the UK algorithms. On UK algorithm one, 60% are Away bets, 77% are Away bets on UK algorithm two and 66% are Away bets on algorithm three.

That’s a significant shift from Aways to Homes but it fits in with the comments I made above. It seems much easier to make a profit on Home bets in the European Leagues and this is shining through on Euro algorithm one.

Interestingly, the ROI on Homes on Euro algorithm one is 15.7% and on Aways it is 19.2%. It’s a similar thing I found with the UK algorithm. Even though I could have had a much better ROI on the Home bets here, by ensuring I made a profit on the Away bets, I actually end up with a better return on the Aways! I did the same on the UK algorithms where the Homes ended up with fewer bets and a better ROI than the Aways.

Here’s the performance by each League:

Germany appears to be the strongest league with France being the weakest league. Germany does have the fewest bets though, so that maybe explains the higher ROI we’re seeing.

I think that’s enough of an introduction to algorithm one.  Of course, the results above will now become the results for system E1. 

The next stage is to filter these results and create system E2.


  1. Interesting reading, Graeme, as always. It's when I read posts like this that I wish I'd paid more attention to maths and stats when I was forty years younger.

  2. Hi Dave.

    I think you’re doing yourself a bit of a disservice mate!

    When I was reading your blog posts on databases, I didn’t have a Scooby what you were talking about as I’m not a database user at all. A strange one but I tend to use PC SAS and Excel which gets me where I need to be. Always tended to shy away from databases.

    I think when you see someone discussing something that they are passionate about and of course, something they understand and use frequently, then to others who may not be as knowledgeable on the same subject, it can read like you’re talking a different language.

    What people don’t realise (although some probably do) is that all I’m doing with the footie is applying what I do as a day job. Instead of the usual pricing algorithms I work with, it’s footie algorithms. I have the tools, software and of course, knowledge which enables me to do it better than most others at this game. By others, I include odds compilers in that which basically, gives me an edge over them I hope.

    Hence, I wouldn’t expect everyone reading to fully appreciate what I’m doing but they don’t have to. You just need to understand the results. ;)


  3. Nice to see some solid analysis over groundless opinion for a change.

    You state that you found "long-term form matters more in Europe". When you look at long-term form was it calculated over previous seasons or did you always restrict your analysis to what you call factors calculated over the current football season?

    Just wondering if their is more stability in EU football from one season to another? And if so could that be utilised to gain some early season advantage over the odds/bookies?

  4. Hi Dave.

    For the rating algorithms, the ratings are very much micro ratings and not macro ratings. By that I mean they only use detailed data from the current season and don’t even look at the longer term trends from previous seasons. Long-term form in this respect is therefore form relating to games that took place more than 4 games ago. 4 games or less would be defined as short-term form. Obviously, different weightings are placed on each game depending on how long ago it took place, with decaying weights as you go back in time.

    The closest I come to doing what you suggest is for systems TOX, STOY and STOZ which are defined by me as ‘Similar Game Model’ systems and these do take into account the longer term trends from previous seasons. However, any games thrown up are then cross referred with my other rating algorithms to ensure a consistency of approach.

    What I have found with the Euro ratings is that you need to place more weight on games that have taken place longer ago. If you kept the same weightings as in any of the UK algorithms (although all 3 algorithms have different weightings slightly), then there doesn’t appear to be the same edge although I must stress, this is a small part of the overall algorithm.

    Now, without trying to pretend I can make sweeping generalisations here that are valid, I suspect what it means is that bookmakers place more weight on recent results in these top European Leagues. By definition, this must mean punters are placing more bets on teams with better short-term form against teams who appear to be out of form since the bookies odds must represent punter’s views. Hence, the edge for my ratings is to go against the grain (as I’ve always said, one purpose of ratings is to know when to go against the consensus) and I’ve ended up finding an edge backing against teams in form with teams who appear out of form. Not surprisingly, as always, my ratings go against what you would intuitively think.

    Of course, that’s just one of many factors in the model but it gives you an insight into how detailed the ratings are at times.

    If I try my best to unpick why teams are thrown up as value, it would take me about a week to do a game I think as the rating algorithm is much more intelligent than I am and I gave up trying to understand rationale for selections a long time ago! Actually, I think I gave up before I even started as I know it’s impossible to understand fully the rationale for a high proportion of the bets.

    For the other bets I can understand, the thing I never understand is why they never appear on more systems! Hence, I don’t understand anything basically and I own all the rating algorithms. lol

    Not sure if any of this makes sense but hopefully it helps fill in any gaps.