What I would say is that the factors in the two European
algorithms are IDENTICAL to the factors in the UK algorithms. When I first told
a few people I was going to be looking at the European leagues, they said I’d
no doubt find that other factors were more important in these leagues than in
the UK leagues. Nope, I didn’t and I did test for other factors but basically,
every factor that appears in the UK algorithm appears in the European
algorithm.
In a way, this is quite reassuring for me as it means the
work I did nearly 2.5 years ago is still valid! When I was looking at factors
you can use to build football ratings, I downloaded data from lots of different
leagues, dumped it in a spreadsheet and started to analyse it. I didn’t care
what league it was from, what time of year it was from and so on. My findings
then were that shots on goal and shots on target were the best underlying
indicator and that remains the case with these new European leagues I’m looking
at.
One big difference I can see between the UK algorithms
and the European algorithms is the fact that historical results plays no impact
in the UK algorithms (it’s a factor in the model but with a weighting of 0 in
every algorithm) and yet, in the European algorithm, I can see that there
appears to be a correlation between historical results and future results in a
fixture. Hence, this factor carries some
weighting in the European rating algorithms.
Without giving away the weightings and variables, I’d say
short-form matters more in the UK than in Europe whereas long-term form matters
more in Europe. That’s a definite trend
I picked up. I’d also say home form
matters much more in Europe than in UK Leagues.
The biggest difference by far between the UK and European
Leagues is the performance of Home and Aways bets. Basically, it’s very easy
(almost hard to lose!) to build a rating algorithm to produce a fantastic
return on Homes but unfortunately, the better the algorithm is on Homes, the
worse it is on Aways!
It took me a fair bit of time to understand this dynamic
as when I first asked the program I have (basically a SAS program which takes in
the data and runs simulations) to maximise the return on some variables in the
algorithm, it was throwing up massive profits on Homes and massive losses on
Away selections for some variables! Hence, it was actually optimal to ignore
the Aways and concentrate on the Homes.
I did consider having a separate algorithm for Homes and
Aways but after discussing it with a few other football analysts, I decided
this was far too big a risk. For a
start, it would have meant that I could have ended up with opposing teams in
some games which would have given me a headache and secondly, it is so
different to what I did with the UK algorithms, it means I would have been
going into the unknown.
Therefore, I reigned back the Home selections a little
and found weightings in the model which ensured I could achieve a decent profit
on the Away bets too. Sacrificing a
little return on Homes to get a better return overall isn’t a bad thing and I
did a similar thing with the UK systems back at the beginning when I think back
but the opposite way around! All the profits were on the aways in the UK
leagues, so I pulled these back a bit and ended up finding some really good
home bets.
One other change between the original backtesting for the
UK systems and the Euro systems is the overround. In the UK systems, I adjusted the draw odds
to try to account for the fact that the draw odds I was using are far too low
and the overround was too high. If you use a bookie like Pinnacle for draw
odds, you can easily beat the draw odds I’m using in the backtesting results. There is NO adjustment in these results
though.
What does this mean? Well, it means the historical
results for AH results are probably lower than what can be achieved in a live
environment. In hindsight, I’d always wished I’d never adjusted the draw odds in
the original backtesting for the UK systems as it made the AH results slightly
too good during backtesting. So, in this
case then, the AH results are going to look too low. So, you can probably see
these AH returns as a base minimum for all the Euro systems. Something to be wary of when looking at the
results.
So, after a few late nights, the first algorithm is
complete. How does it look?
Well, similar to what I’ve done with the UK algorithms, I
won’t ever publish the results from seasons 2000-2005. Quite simply, it doesn’t
add anything to the analysis as the data is 100% backfitted and therefore, it
is useless data as a means of projecting the future.
One small change compared to the UK systems is the fact
that I am now 7 years away from the last season of fully backfitted data. I’ve made a conscious decision to change the
part seasons I’ve used during the backfitting process.
I’ve used 50% of the data from 2006/07 and 2007/08 (which
is identical to what I did with the UK algorithms) but I’ve also used 50% of
the data from 2010/11. I feel like if I
don’t use any more recent data, the data in the model would be too far out of
date and therefore, I’m happy to sacrifice a season’s results to ensure I could
get a model that works going forward.
What it means is that when you look at results from
2010/11 (as well as the first two seasons), be very wary of them. They are NOT backfitted results so to speak,
50% of the games in this season was used in the backfitting process. Hence, they are not 100% reliable but then
again, they are not 100% backfitted either.
Anyway, caveats aside, here’s the results of the first
European algorithm.
Similar to what I did when I first showed the UK
algorithms to people, here’s the results from the 3 seasons which are fully
backtested. This should give an
indication of what can be achieved going forward (ignoring the fact I now know
the single systems won’t achieve these results as they didn’t with the UK
bets).
The backtested results look about 75% of the overall
results, so there isn’t a big discrepancy.
It was bigger on the UK results! Hence, it’s OK to look at the full set of
results and not get drawn into worrying about the results being massively
overtstated.
Here’s the split by Homes and Away.
The most interesting aspect of Euro algorithm one is the
fact that over 60% of the bets are Home bets!
I can’t explain how different this is to the UK algorithms. On UK
algorithm one, 60% are Away bets, 77% are Away bets on UK algorithm two and 66%
are Away bets on algorithm three.
That’s a significant shift from Aways to Homes but it
fits in with the comments I made above. It seems much easier to make a profit
on Home bets in the European Leagues and this is shining through on Euro
algorithm one.
Interestingly, the ROI on Homes on Euro algorithm one is
15.7% and on Aways it is 19.2%. It’s a similar thing I found with the UK
algorithm. Even though I could have had a much better ROI on the Home bets
here, by ensuring I made a profit on the Away bets, I actually end up with a
better return on the Aways! I did the same on the UK algorithms where the Homes
ended up with fewer bets and a better ROI than the Aways.
Here’s the performance by each League:
Germany appears to be the strongest league with France
being the weakest league. Germany does have the fewest bets though, so that
maybe explains the higher ROI we’re seeing.
I think that’s enough of an introduction to algorithm
one. Of course, the results above will
now become the results for system E1.
The next stage is to filter these results and create
system E2.
Interesting reading, Graeme, as always. It's when I read posts like this that I wish I'd paid more attention to maths and stats when I was forty years younger.
ReplyDeleteHi Dave.
ReplyDeleteI think you’re doing yourself a bit of a disservice mate!
When I was reading your blog posts on databases, I didn’t have a Scooby what you were talking about as I’m not a database user at all. A strange one but I tend to use PC SAS and Excel which gets me where I need to be. Always tended to shy away from databases.
I think when you see someone discussing something that they are passionate about and of course, something they understand and use frequently, then to others who may not be as knowledgeable on the same subject, it can read like you’re talking a different language.
What people don’t realise (although some probably do) is that all I’m doing with the footie is applying what I do as a day job. Instead of the usual pricing algorithms I work with, it’s footie algorithms. I have the tools, software and of course, knowledge which enables me to do it better than most others at this game. By others, I include odds compilers in that which basically, gives me an edge over them I hope.
Hence, I wouldn’t expect everyone reading to fully appreciate what I’m doing but they don’t have to. You just need to understand the results. ;)
Graeme
Nice to see some solid analysis over groundless opinion for a change.
ReplyDeleteYou state that you found "long-term form matters more in Europe". When you look at long-term form was it calculated over previous seasons or did you always restrict your analysis to what you call factors calculated over the current football season?
Just wondering if their is more stability in EU football from one season to another? And if so could that be utilised to gain some early season advantage over the odds/bookies?
Hi Dave.
ReplyDeleteFor the rating algorithms, the ratings are very much micro ratings and not macro ratings. By that I mean they only use detailed data from the current season and don’t even look at the longer term trends from previous seasons. Long-term form in this respect is therefore form relating to games that took place more than 4 games ago. 4 games or less would be defined as short-term form. Obviously, different weightings are placed on each game depending on how long ago it took place, with decaying weights as you go back in time.
The closest I come to doing what you suggest is for systems TOX, STOY and STOZ which are defined by me as ‘Similar Game Model’ systems and these do take into account the longer term trends from previous seasons. However, any games thrown up are then cross referred with my other rating algorithms to ensure a consistency of approach.
What I have found with the Euro ratings is that you need to place more weight on games that have taken place longer ago. If you kept the same weightings as in any of the UK algorithms (although all 3 algorithms have different weightings slightly), then there doesn’t appear to be the same edge although I must stress, this is a small part of the overall algorithm.
Now, without trying to pretend I can make sweeping generalisations here that are valid, I suspect what it means is that bookmakers place more weight on recent results in these top European Leagues. By definition, this must mean punters are placing more bets on teams with better short-term form against teams who appear to be out of form since the bookies odds must represent punter’s views. Hence, the edge for my ratings is to go against the grain (as I’ve always said, one purpose of ratings is to know when to go against the consensus) and I’ve ended up finding an edge backing against teams in form with teams who appear out of form. Not surprisingly, as always, my ratings go against what you would intuitively think.
Of course, that’s just one of many factors in the model but it gives you an insight into how detailed the ratings are at times.
If I try my best to unpick why teams are thrown up as value, it would take me about a week to do a game I think as the rating algorithm is much more intelligent than I am and I gave up trying to understand rationale for selections a long time ago! Actually, I think I gave up before I even started as I know it’s impossible to understand fully the rationale for a high proportion of the bets.
For the other bets I can understand, the thing I never understand is why they never appear on more systems! Hence, I don’t understand anything basically and I own all the rating algorithms. lol
Not sure if any of this makes sense but hopefully it helps fill in any gaps.
Graeme