Allsvenskan 2016 update

With three rounds of Allsvenskan games played since my last post, it’s time for an update. Like last time, I’m just going to throw a few visualizations at you together with my initial thoughts without going too much in-depth.

Starting out with the league table, we see just how close the league has been so far – with eight games played, only four points separate Östersund in 11th place from Malmö in 2nd.

allsv_update_01_01

We can also see some interesting streaks since last time, with Norrköping and Elfsborg winning all three games while Hammarby, Gefle and Falkenberg have been struggling. Looking at the early surprise teams we see that Sundsvall have continued to perform well while Jönköpings Södra have dropped in the table.

allsv_update_01_02Looking at shots we see how Hammarby, Kalmar and Norrköping have all moved in to the ‘busy attack, quit defence’ quadrant, indicating that they’ve played a bit better lately (or faced easier opposition!), while Sundsvall is still stuck in the ‘quiet attack, busy defence’ quadrant.

allsv_update_01_03While Malmö produces a lot of shots, they’re still one of the most ineffective sides up front. Göteborg and Norrköping on the other hand are enyoing some effective scoring at the moment.

allsv_update_01_04Sundsvall are still conceding a lot of shots, but at least they’re not converted into goals very often – which in part explains their good results so far. Elfsborg have moved into the ‘formidable’ defensive quadrant, only conceding one goal in the last three games.

allsv_update_01_05Looking at Expected Goals, Malmö are still the clearly best team, with Norrköping improving while Djurgården have dropped a bit. Here we really see the difference between the early surprise teams’ performance recently, as Sundsvall have improved both attacking and defensive numbers while Jönköpings Södra have done the opposite.

So how would the teams rank xG-wise? Expected Goals Difference should do well as measure of skill, and here we again see how the model ranks Malmö as the best side so far, with Norrköping and AIK the main contenders. A bottom three of Helsingborg, Gefle and Falkenberg have also emerged.

allsv_update_01_06

Another way of evaluating the teams’ performance so far is to simulate how many points on average each team would’ve received from their games. To do this I’ve used the shots from each game to simulate the result 10,000 times and the teams have then been awarded Expected Points based on the derived 1X2 probabilities.

For example, if the simulation would come up with probabilities of 0.5, 0.3 and 0.2 for each outcome then the home side would be awarded 0.5*3 + 0.3*1 or 1.8 Expected Points, while the away side would get 0.2*3 + 0.3*1, or 0.9 Expected Points.

Here’s a table of the team’s Expected Points so far:

allsv_update_01_07But a team can’t get 1.8 points from a game, only 0, 1 or 3 – so how have the teams performed compared to their Expected Points?

allsv_update_01_08Note: Malmö have been awarded a 3-0 win against Göteborg as the game was abandoned due to home fans throwing pyrotechnics towards a Malmö player. These points have been included.

Here we see how Helsingborg and Sundsvall have taken quite a lot more points than expected, while Falkenberg and Kalmar have done the opposite. This could be the result of some good/bad luck, but it can also mean that the model fail to properly assess the quality of these teams.

Let’s dig deeper and have a look at the Expected Points distribution of each team:

allsv_update_01_10Looking at these distributions we can see just how extreme the results have been for some of the teams so far. In fact, my model estimates that if we re-played Helsingborg’s games 10,000 times, they would get 13 points or more only about 5% of the time!

Lastly, here’s my updated prediction of the final Allsvenskan 2016 table:

allsv_update_01_09

That’s it for now. I hope to be back with another update when the league have gone on break for Euro 2016, and maybe I’ll look closer at individual players then.

 

Advertisements
Allsvenskan 2016 update

Allsvenskan 2016 so far

With 5 rounds of games played I thought it would be a good time to look at how the 2016 Allsvenskan is going. Let’s have a look at the league table so far:

2016_00

True to it’s rather unexpected nature, the opening five rounds of the 2016 Allsvenskan have seen some surprises, and I don’t think anyone expected Sundsvall and newly-promoted Jönköpings Södra to be at the top! Also, last year’s top team’s have been struggling a bit, but seem to have picked up the pace lately.

But nevermind the table – altough it never lies, it does give an unfair view of the teams’ underlying performances, especially with so few games played. To really have a look at how the team’s been coming along so far I’ve reproduced some of Ben Mayhew‘s beautiful scatterplots:

2016_01

Looking at shots taken and conceded per game we can see how Malmö, Djurgården and AIK have dominated their games so far, outshooting their opponent’s by some marginal. League leaders Sundsvall have, given their results, surprisingly spent most of their time in defence – but that’s just how Allsvenskan is.

 

2016_02

When looking closer at shooting effectiveness we see that surprise teams Sundsvall and Jönköpings Södra have been clinical in front of goal so far, partly explaining their results. Häcken on the other hand have really struggled to score.

 

2016_03

Looking at defensive effectiveness we can really see why Sundsvall are at the top of the Allsvenskan table. While spending a lot of time in defence, they’ve managed to concede very few goals given their shots faced. If this is down to some new tactic, skill or simply dumb luck remains to be seen – but for a team like Sundsvall I’m willing to say it’s the latter.

 

2016_04

Expected Goals-wise we see just how lucky Sundsvall have been so far. They’ve conceded a lot of xG while failing to produce up front, putting them in the same group as struggling sides Falkenberg, Häcken, Helsingborg and Gefle. Malmö is at the other side of the scale, producing a lot of high-quality chances while keeping a tight defence.

Another interesting thing to look at is time spent in Game States. As a result of their good performance (or luck!) so far, Sundsvall have only spent about 1% of minutes played losing so far while Gefle have only spent 10% in the lead!

2016_05

What about a prediction for the rest of the season then? I’ve used the games so far to fire up my league table simulation based on my Monte Carlo xG game simulation, and this is the result:

predict_2016_05

Note: The first table posted here was wrong due to a minor error in the code. This is the correct table.

As a Djurgården supporter, I kinda like the result – even though I think it’s a bit unrealistic for us to compete for silverware just yet. And anyways, a simulation of the whole season based on only 5 games tells more about what has happened so far than what we’ll see in the future, at least in my opinion.

The model clearly ranks Malmö as the best team in the league, as it’s done pretty much every season in my database, alongside AIK and Göteborg. Both the newly promoted teams, Östersund and Jönköpings Södra, seem competent xG-wise and have a good chance of staying up, while reigning champions Norrköping seem to be performing worse than last year. Gefle are always in the bottom of these kind of tables, but nevertheless seem to outsmart every metric available to avoid relegation season after season – but maybe this is the year they finally drop down to Superettan?

I’m planning to do these kind of updates at regular intervals, and maybe add some more plots and deeper analysis, but this will have to do for now!

Allsvenskan 2016 so far

A rough prediction of the new Allsvenskan season

Though I hadn’t planned on posting a prediction for the new Allsvenskan season until a couple of rounds had been played, after seeing Per Linde of fotbollssiffror posting his prediction on twitter and mentioning how he disagreed with it, I decided to do the same and fire up my league table prediction script from least season.

My Monte Carlo game prediction is designed to use at least a couple of rounds of data, so I was unsure how it would go about predicting a new season right from scratch, but I actually think it turned out better than expected:

predict_2016_01

There are some obvious problems though. First off, the script still thinks it’s 2015 and Jönköpings Södra and Östersund are playing in Superettan, causing some strange error where their every game is simulated as a 0-0 draw. This obviously skews the prediction for every team, but it isn’t really an error as the league simulation script isn’t designed to involve different leagues, and it’ll will be corrected when I update the database with the weekend’s results.

Also, every game is simulated with the teams’ squads as they were at the end of the 2015 season which is obviously a problem, with a lot of players coming and going since then – but again this will be fine once I update the database.

What about the actual prediction then? Besides the error with the promoted teams the only problem I have subjectively is the high percentages for Gefle’s relegation (they’ve been ruled out as long as I can remember but have still managed to stay up year after year) and Norrköping’s title defence. I’d also switch places between Djurgården and Häcken while placing Örebro somewhere in lower mid-table. The promoted sides are hard to predict, but I definitely place Östersund above Jönköpings Södra.

Though I’m pretty happy with the prediction, I’ll update it in another post once a couple of rounds have been played.

A rough prediction of the new Allsvenskan season

World Premiere(?): Expected Goals for Finland’s Veikkausliiga

A while back I stumbled upon shot location data for Finland’s top league, Veikkausliiga. I haven’t seen an Expected Goals model for this league before so despite having no interest in or knowledge of the league, I decided to develop a model for it based on my Expected Goals model of Swedish football. My idea is that a model could be a very useful tool and make a big difference when betting these smaller, lesser-known leagues.

Unfortunately only one season of data is available and like with the Swedish data no distinction is made between shot types beside penalties. But the overall quality seems to be of a higher standard than it’s Swedish counterpart and the data also contains more detailed player metrics like number of accurate passes, fouls, turnovers, etc., which might prove useful in the future.

Model results

FIN_01

First off I’ve tested if the Finnish data is significantly different from that in my Swedish model. It turns out it is, but as one season of data is probably not enough to develop a decent model, I’ve opted to add the new data to my existing model and use it for Veikkausliiga. No Finnish data will be used when dealing with Swedish games however.

Let’s look at some plots of how the model rates the teams and players in Veikkausliiga:FIN_02FIN_03FIN_04FIN_05

Data from the Swedish leagues is colored red and not included in the regressions.

FIN_06

What we can see is that the r-squared for xG/G are worryingly lower than the Swedish model’s 0.61. Also, the model does a better job explaining team defence than attack, just like the Swedish model. Why that is I don’t know.

The model rates HJK as the best team in terms of both xG and xG against but they only finished third – albeit just two points below champions SJK, who seem to be over performing massively with their goal difference about 13 goals higher than expected.

At the bottom of the table, KTP seem to have over performed while demoted Jaro under performed. Mariehamn also seemingly under performed both in attack and defence.

FIN_07
FIN_08Looking at individual players, I’d say the model performs well with an r-squared of 0.8, similar to that of the Swedish model. RoPS’ Kokko had the highest xG numbers to go with his title as top scorer, and interestingly all players in the top 10 in goals outscored their xG numbers.

Betting backtest

While the model doesn’t seem to be as good as my Swedish model, I still think it’s reasonably good considering only one season of data from the league is used. But what about its performance on the betting market?

Just like I did with Allsvenskan, I’ve simulated each game using my Monte Carlo method for game simulation. Obviously only using data available prior to each game, my method rely heavily on long-term team and player performance and my initial guess was that using it for the 2015 Veikkausliiga wouldn’t be profitable since there’s not enough data. Well, let’s see.

backtest_09

Running the backtest my suspicion immediately proved right, as can be seen on the above plot. The model looks like a clear loser, and setting a minimum EV when betting doesn’t seem to change that. But looking at the plot, there’s actually a point late in the season where the model start to perform better.

Since the model was at a huge disadvantage from the start with so little data (the Allsvenskan backtest used four seasons of data), I’ll allow myself to do some cherry picking. Here’s how the model performs betting Pinnacle’s odds after the international break in September:

backtest_10

backtest_11backtest_12

Just like before, Model 2 is just a variation of my Monte Carlo game simulation where home field advantage is weighted heavier. Like with Allsvenskan, both models seem to focus on underdogs and higher odds. What is encouraging is that this time only a minimum EV threshold of 5% is needed to single out a reasonable number of bets. In my backtesting of Allsvenskan a threshold of 50% was needed, indicating that the model probably was skewed in some way.

Like in the Allsvenskan backtesting the model makes a killing on the 1X2 market due to its ability to sniff out underdogs. There’s also some profit to be made on Asian Handicaps while only Model 1 makes a profit betting Over/Unders.

I’ve also run the backtest against Matchbook’s odds, but while I won’t bore you with more plots and tables, what I can say is that the results again match up to my findings from the Allsvenskan backtesting. At Matchbook, betting the 1X2 market is still hugely profitable, the Asian Handicaps close in on odds around even money while Over/Unders perform better, albeit only on closing odds.

Conclusion

As expected, betting on Veikkausliiga from the start of the season would’ve proved a dismal affair. This is understandable since my method rely so heavily on long-term performance and using only a couple of games for assessing player and team quality isn’t a good idea.

But the model did seem to perform better late in the season, and while this probably isn’t enough for me to use it for betting on the upcoming 2016 Veikkausliiga season, I’ll keep my eyes on its performance against the market and maybe jump in when it seems to be more stable.

 

World Premiere(?): Expected Goals for Finland’s Veikkausliiga

Predicting the final Allsvenskan table

With the Swedish season soon coming to an end it’s a good time to try out how the Expected Goals model will predict the final table. With only three games left a top trio consisting of this season’s big surprise Norrköping just in front of Göteborg and AIK are competing for the title as Swedish Champion. At the opposite end of the table Åtvidaberg, Halmstad and Falkenberg look pretty stuck, with the two latter teams battling it out for the possible salvation of the 14th place relegation play-off spot.

predict_table_01

Let’s take look at the remaining schedule for the top three teams:

Norrköping have two though away games left against Elfsborg and Malmö, who are both locked in a duel for the 4th place which could potentially mean a place in the Europa League qualification. Elfsborg are probably the tougher opponent here, with reigning champions Malmö busy in the Champions League group stage. Between these two away games Norrköping will play at home against Halmstad who are fighting for survival in the bottom of the table.

Göteborg have two though away games themselves, first off at Djurgården and later a very important game against fellow title contenders AIK. This game will probably decide which of the two will challenge Norrköping for the title in the last round. Göteborg finishes the season at home to Kalmar who could possibly play for their survival in this last game.

AIK have the best remaining schedule of the three top teams, with away games at Halmstad and Örebro on either side of the crucial home game against Göteborg. As mentioned, Halmstad is fighting for their existence in Allsvenskan, while Örebro’s recent great form have seen them through to a safe spot in the table.

At this late stage of the season there are a lot of psychological factors in play, with the motivation and spirit of teams and players often being connected to their position in the table. These aspects are very hard to quantify and have not been incorporated in my model. So my prediction of the table rely solely on my Expected Goals model used in Monte Carlo simulation. I won’t reveal exactly how I simulate games but the subject will probably be touched upon in a later post so I’ll spare you any boring technical details for now.

Each of the remaining 24 individual games have been simulated 10,000 times. For each of these fictional seasons I’ve counted up the points, goals scored and goal differences for every team to come up with a final table for that season. Lastly I’ve combined all these seasons into a table with expected points and probabilities of each teams possible league positions.

predict_table_03

The model clearly ranks Norrköping as the most likely winner with Göteborg as the main contender, while AIK’s chances of winning the title is only at about 18%. The bottom three looks rather fixed in their current positions with Falkenberg having only a 2% chance of overtaking Kalmar in the last safe spot in the table. At mid-table things are still quite open, even though Djurgården’s season is pretty much over with a 89% chance of placing 6th. Malmö seem to have an advantage against Elfsborg in the race for the 4th place, but given their Champions League schedule their chances should probably be less than the model predicts.

I’ll probably be posting updated predictions on my twitter feed after each of the top teams remaining games to see how the results change the predictions.

Predicting the final Allsvenskan table

The Model part 1 – Exploring Expected Goals

In a series of posts I will be covering the work done on and with my model on Swedish football. In this first part of the series I’ll talk about the underlying concept upon which the model is built – Expected Goals.

We’ve all seen those games were the result ended up being extremely unfair given how the game played out. Maybe the dominant team had a spell of bad luck and conceded an own goal while missing their clear chances, or the opposing goalkeeper played the game of his career making some huge saves, or maybe the lesser side luckily managed to score through their only real chance. All these scenarios point to the same thing – there’s a lot of randomness associated with goals. We often see teams playing great and still lose while a poorly playing side take home all three points.

Because of this random nature of football, only looking at results and goals scored and conceded is not a good way to assess true team and player strength. Sure, good teams usually win but they also sometimes run into spells of bad form and perform worse, while bad team sometimes goes on a good run, securing that last safe spot in the table just in time before the season ends.

To combat this problem, the football analytics community has turned its eyes to a more stable part of the game – shots – in hope that these will exhibit less randomness and hold more explaining power. While it is certainly true that examining how many shots a team produce and concede can tell you more than just goals, the same problem with randomness exists here too. Good teams usually take more shots than they concede but as we all know, this is not always the case.

Expected Goals aims at getting down to the core of why good teams perform well and bad teams perform worse, and in the process avoid some of the problems associated with just summing up goals and shots. It is based on the notion that good teams takes more shots in good situations while bad teams do the opposite. The same is true in defence, as good teams avoid conceding more shots in good situations than bad ones.  The hope is that these characteristics will be less random and more useful in explaining and predicting football.

In its essence, Expected Goals gives you a value of how often a typical shot ended up in the net, and this is done by examining huge datasets in a number of different ways. Usually an Expected Goals model is based on where on the pitch the shot was taken and the reason for this is quite clear once you come to think about it – it all comes down to shot quality. Imagine two different scoring opportunities, the first being 25 meters out from the goal and the other being 5 meters from the goal. In traditional football reporting these two shots will be treated just the same, but we all know that the latter is preferable since it is closer to goal and probably an easier shot to make.

Given the different methods, ideas and datasets football analysts work with, there’s no right way to calculate an Expected Goals or xG value. For example, an ambitious analyst might account for not only where the shot was taken, but also what type of shot it was, what kind of pass preceded the shot, if the player dribbled before taking the shot etc. The possibilities are only limited by the data, and with the likes of Opta covering the top European leagues, these are vast.

Let’s take a look at real example. In my database (more on that in later a post) I have 243 penalties recorded, of which 192 ended up in goal. To get the xG value for a penalty we just need to calculate the fraction of penalties which turned into goals, in this case 192/243, or about 0.79. In comparison, the xG value for a shot taken from the same penalty spot during regular play is estimated by my model to be about 0.25, which makes sense since it’s a harder shot than the penalty.

As shown by several football analysts (for example on the blog 11tegen11), Expected Goals hold some real power at explaining football results. But it also has its weaknesses. There’s currently no way of accounting for the position of the defenders when the shot was taken, which surely would effect scoring expectation. Furthermore Expected Goals only deals with actual shots taken but as we all know, not all scoring chances produces a shot. It’s also true that xG values are averages, meaning that there’s actually a whole range of different expectations for different players. Surely Leo Messi will have a higher chance to score than Carlton Cole in nearly every situation.

To me, the real strength of Expected Goals lie in that we can treat it as a probability and use it in simulations in order to examine more complex situations. Take a look at the penalty for example. With an xG value of 0.79, we can expect an average player to score most of the times, but he’ll also miss some shots. In fact, it’s not uncommon for him to miss several shots in a row. With the help of Monte Carlo simulation (again, more on that in later posts), we can examine the nature of the penalty shot more closely. Let’s say we get our player to take 10,000 penalty shots in a row. How many will he make?

Pen_sim_01

As we can see our player started out by making his first shot only to quickly drop below the expected 79% scoring rate, but as he took more and more shots he slowly moved towards his expected scoring rate. He actually scored 7928 penalties which is very close to the expected 7900.

Let’s try a more complex simulation just for fun. Imagine a penalty shoot-out. How likely is it to make all five shots? Four out of five? My database doesn’t contain any penalty shoot-outs but my guess is that these are converted on a slightly lesser scale than regular penalties, either due to the stress involved or maybe fatigue. But let’s use our standard xG value of 0.79 for simplicity. Let’s simulate 10,000 shoot-outs with five penalties each.

Pen_sim_02

Given the conditions we’ve set up, it seems there’s about a 30% chance to score all five penalties while making four is the most likely outcome. Going goalless from this shoot-out looks rather unlikely but as I’ve said the true chance of scoring after playing 120 minutes and with the hopes of thousands (or millions) of people on your shoulders is probably lower so a goalless shoot-out is probably more likely than our simulation shows.

That’s it about Expected Goals for now, and as I’ve said we will explore the possibilities of Monte Carlo simulation more thoroughly later. In my next post about my model I’ll talk about the data used for building it.

The Model part 1 – Exploring Expected Goals