Thursday, 14 April 2011

Statistical Freak?

One point I should make before people start reading is that with any form of statistical analysis, you need as many data points as possible. That's not the same as saying you can't draw meaningful conclusions with a smaller sample of data but simply, you need to be careful hat you don't get sucked in to believing the conclusions too much as they will never be statistically significant if the sample is small.

One thing I've looked at briefly before (after February's disaster) was the draws that the systems hit on a monthly basis.

In general, the systems pick out on average 325 games per month. We can then look at the number of home wins, away wins and draws that appear each month. If I convert these numbers to percentages, we get the % number of home/away/draws that occur per month.

This is shown in the table below.



Now, even just looking at this table with no statistical analysis at all, we can see a few things quite quickly.

On average, there are more home wins a month than away wins and draws. Over the period of data (over 14,300 games), there have been 41% home wins, 34% away wins and 25% draws.

In addition, without being too smart, you can that there is a range where the percentage of each result bobs around on a monthly basis. Looking quickly, I'd say homes bob around between 35% and 50%, draws bob around 20% and 30% and aways bob around between 25% and 35%.

However, with just a little bit of introductory statistics, we can analyse the results a little more scientifically.

We have 45 discreet months here which means we have 45 data points for Homes,Aways and Draws. By looking at the standard deviation of each sample, we'll be able to put a confidence level on how likely it is that the % stays within a range.

Simply, once you know the St Dev of a sample, you know that there is a 10% chance that any data point can lie outside the range if the range is defined as within 1.645 St Devs of the mean. Likewise, increasing this to 1.96 St Devs, there is a 5% chance a point can lie outwith the range. Lastly, by using 2.576 St Devs, there is a 1% chance that a data point can lie outwith the range.

Turning this on its head then, you can be 90%,95% or 99% sure that any data point will within the chosen range.

I'm not really too interested in what is happening with Homes/Aways as they aren't the issue at the moment. Let's look at the draws to begin with.



I have highlighted blue any months where the draw % falls outwith the 90% boundary. Over a sample of 45 points, you'd expect this to happen between 4 and 5 times. It has happened 4 times in total.

Looking closer though, it doesn't take a statistical expert to see the draw % in the last few months on these systems.

Jan-11 32.3%
Feb-11 38.5%
Mar-11 30.1%
Apr-11 41.7%

The first thing to point out is that February 2011 is the worst month ever for draws and I quoted on here that I believed that could only happen once every two seasons or so. The probability of it happening looking at the data is somewhere between 1% and 5% although going by the fact the 95% confidence interval stops at 38.4%, the probability of it happening is very close to 5%.

5% of the time is once every 20 months which based on my data, is twice in nearly 5 seasons. Hence, February was a blip that I didn't think could be repeated.

Although we are only half way through April, the draw % is sitting at 41.7%. Considering the 99% confidence interval stops at 42.6% and the 95% interval stops at 38.4%, we are talking between a 1% and 2% chance of this happening.

Now, I don't want to get too drawn into statistical probabilities as this is gambling after all and nothing is ever a sure fire winner but I will be astounded if the draw % for April ends up at 41.7%. That's basically saying that over 2 in 5 games finish draws. Considering the average draw odds are much larger than 6/4, then anyone backing the draw in April would be raking it in. February made over 100pts backing the draw and April would be on course to beat that!

In 2011, there have been 1,323 games picked out by the systems and 459 have resulted in draws. This is 34.5% of the games finishing draws! Looking back, I can't see a run like this at any time within my data and therefore, what we are witnessing is a statistical freak that can't continue.

Before I conclude, let's look at the away win % in April. This is down at 16.7%. The worst month ever for Aways was a win % of 18.8%. This month is therefore 12% worse (or 2.1% pts) than any other month EVER for away wins on the systems. Considering most of my selections are away bets, then this means my systems have to lose this month.

Overall then, what does all the above mean?

Well, I reckon we are experiencing a freak occurrence with draws since Christmas and in particular, February was the worst month ever experienced for my systems and the worst month ever for the % of draws. April is now ahead of February although, we still have the second half of the month to go. The draw % has to drop (100/1 to not drop I think!) this month and hopefully, the away bets % will increase as the draws reduce and things will get back to more like the normal.

The great thing about seeing this up close during this season is that I'll be more mentally prepared for this in future I think. I know the boundaries of what can happen and I also know that quite often, the results achieved will be outside the boundaries but due to the fact I've seen it this season, it won't surprise me when it happens again.

Will things ever get back to the norm? We'll see what this weekend brings........

2 comments:

  1. Nice analysis as always man. Don't give up :)

    ReplyDelete
  2. Cheers for the comment! I won't be giving up just yet.

    Graeme

    ReplyDelete