Thursday, 7 April 2011

TFA Ratings

I think it’s worth doing a quick post on some of the work that has gone into producing the ratings behind the systems. I’ve steered clear of this on the blog as ultimately, this is my intellectual property and there are a lot of very intelligent people out there who can take what I’ve done and improve upon it I think. I don’t pretend that any of these ideas are original ideas as they aren’t and I’m not ashamed to say that I’ve picked up most of this from other sources in the public domain. However, what I’ve done with the data is different to anything I’ve seen before and therefore, my ratings are unique from that aspect.

Firstly, a bit of background about me to try to fill in how I’ve reached this stage. I went to university to study Mathematics but switched to Financial Economics at the end of my second year. I went on to gain a first class honours degree BSc degree in Financial Economics. My passion was statistical modelling and in particular, regression modelling.

When I went to work for an actuarial pricing team in the insurance industry after graduation, my modelling ability came to the fore and I now manage a small actuarial team within the organisation. I’m very comfortable with SAS, SQL and of course Excel. Data manipulation and statistical modelling is what I do as my day job so to speak.

When I first turned my attention to looking at the football, it involved an initial 6 week process of trying to establish how I would go about building a statistical model from scratch. After a bit of research into what others had done in the past, it became fairly clear that others had put a lot of thought into this and therefore, I didn’t need to build something from nothing. All I had to do was understand what had been done before and improve upon it.

I make no apologies for not referencing my reading material here but I hope you can understand why I don’t want to lead people to trying to understand what I’ve done. In summary, the model would have to:

 Take account of different abilities of teams
 Take account of the fact that teams at home are stronger
 Take account of the fact that more recent form should take precedence over earlier form
 When looking at the results of games and performance of teams, account should be taken of the ability of the teams involved
 Lastly and most importantly, a team’s ability can be summarised by their ability to attack (score goals) and their ability to defend (concede goals)

I then set about seeing what data I could obtain which would enable me to build a statistical model. Importantly, I had to try to find something that could be updated regularly (after each set of fixtures) and of course, it had to be manageable within Excel preferably.

In summary, I captured data going back to 2002 which shows Home Goals, Away Goals, Home Shots, Home Shots on Target, Away Shots, Away Shots on Target for every game in the leagues I was interested in looking at.

I then spent the next few months trying to build a regression model that would allow me to build a rating algorithm for the football games. For those not familiar with this, it is simply using the historical data I’ve collected to calculate the strength of each team when each game has taken place and then when combining the variables in my model together, this creates an equation which can predict the likely home goals and away goals in any game.

I then plug this into a Poisson distribution equation (again, this is common knowledge amongst football system builders and has nothing to do with my ideas!) and this gives us the likely home/win/draw % in any game after a bit of manipulation.

I ran 100,000 iterations to achieve the optimal weightings for the regression equation and didn’t use the traditional GLM method of minimising least squares. Simply, I didn’t want to backfit the systems so much in that I restricted the number of games massively and ended up with a 100% strike rate over a dozen games. I simply worked it to goal seek a return of circa. 20% across the backfitting of the games and maximise the number of games where possible so I got a large enough data set.

This gave me my first rating algorithm and then it was simply a case of testing this with the results in 2008 and 2009 and ensuring that the returns were in line with the backfitted years. As it turns out, the returns were within a few % points of the earlier ROI and my first rating algorithm was built.

What do I do with it now was the obvious question…..

Clearly, backing every team that appears as value is a profitable method (system 6) but I looked at filtering the selections by the amount of value that the ratings determined was inherent in each selection. This created systems 7,8 and 9 as ultimately, these systems purely separate the bets out by the amount of value the ratings define in each game. The ratings can clearly differentiate between the games that appear as value bets.

Once I had built the first ratings algorithm, it was always going to be the case that a second algorithm would follow I think. Simply, when I was looking at the regression model for the first algorithm, I had lots of ideas about other ways to model this and therefore, I always wanted to go back and look at doing something a bit different by manipulating the data a bit differently.

After a month’s work or so, system 21 was built and using the same idea for systems 7-9, systems 22 and 23 were built.

It was purely by coincidence that systems 6-21 etc. were looked at as it was someone else who pointed out that when the two algorithms pick the same team out as a high value bet (system 22/23 and system 8/9), the results appeared to be exceptional. I looked into it a bit further and as we’ve seen this season, the results that can be achieved by doing this are as good as anything I’ve personally seen for football betting.

Now, people can read the above and make accusations that I’ve copied Dixon/Coles or Fink Tank and so on but at the end of the day, I’m not saying that any of this is my own idea. I’ve taken ideas developed by others before me and improved upon them and hopefully, can outperform the results anyone before me has achieved. That’s my aim.

2 questions people have always asked me about the ratings this season.

Will this method always work? I’ve no idea. I doubt it myself as if I look at the returns over the past 8 seasons, the returns have basically been on a downward trend. Hence, I don’t believe that using data from 2002-2006 can really help me predict the results of games in 2011 as well as it could in 2009 but I can always build new systems using more data as time goes on. Therefore, at the end of each season, I can build new algorithms with an extra season’s data and go through the same process as I’ve done before this season and hopefully, that will keep me ahead of the game and the bookmakers.

Why do my ratings work better than some others which use the same variables? I don’t know the answer to this one. I’m using very similar criteria for bets as other rating models and shots on goal is my main indicator and this is similar to what others use. My advantage is purely from my backfitting/backtesting I think as I’m quite comfortable with regression modelling and how to use it and I figure that’s probably my competitive advantage when it comes to football modelling.

How can we maximise profits and minimise risk using the ratings? Unfortunately, even though we are 80% through this season, I’m not sure I’ve quite got to this stage of the process yet. Saturday was a good example! I just want to get to a stage where I’ve got a proven way of making a return of 10%+ on the football over a season and of course, I want to do it with the highest number of games, lowest betting bank possible. That way, I can maximise profits and minimise the risk.

Anyway, I hope the above gives a flavour of what I’m doing this season. Unfortunately, this season was always going to be an experimental season and being honest, due to the need for me to tweak the algorithms each season with extra data, the systems will never be the same two seasons in a row. However, the base assumptions will stay the same in each algorithm and therefore, the systems should be as profitable one season to the next I hope.

No comments:

Post a Comment