Another Allsvenskan 2016 update

As the league has now gone on a summer break for the UEFA Euro 2016, let’s take another look a the Allsvenskan season so far.

2016_update_01

Since last time, Malmö have overtaken Norrköping at the top, and the early surprise side Sundsvall have dropped to 6th. AIK, Kalmar and Häcken have climbed in the table, while Djurgården, Hammarby and Helsingborg have done the opposite. Gefle and Falkenberg still struggle at the bottom.

2016_update_02

Shots-wise the league seems to have settled, as only Elfsborg and Göteborg have changed quadrants since the last update. Also, Malmö’s gap to the other clubs has decreased.

2016_update_03

Looking at effectiveness in attack, we can see partially why some sides have climbed or dropped in the league table. Malmö and Häcken have enjoyed some efficient scoring, moving them from ‘wasteful’ into the ‘constant threat’ quadrant, while Djurgården have done the opposite.

2016_update_04

Defensively, we see how AIK have been more effective at the back together with Sundsvall and Jönköpings Södra, while Djurgården’s performance has worsened.

2016_update_05

Looking at xG, we see how AIK have overtaken Malmö as the best attacking side, but have at the same time moved into the  ‘worse defence’ half. Hammarby’s attacking numbers have dropped while Falkenberg have performed better. Östersund and Örebro still sit at opposite ends, with the former involved in some low xG games and the latter producing some xG-fests with both defensive and attacking xG at about 1.8 per game.

2016_update_06

Malmö are still at the top of the xGD table, but have dropped a bit from their >1.0 from last time. Kalmar have climbed to third while Djurgården and AIK have dropped. The bottom three remain the same as last time.

2016_update_07

2016_update_08

Looking at Expected Points for a ‘fair’ table based on the shots taken and conceded so far, we see how Malmö are still at the top while Gefle are stuck at the bottom. Göteborg have overtaken AIK in the top three, while Kalmar have climbed by about 8 points out of 12 possible.

2016_update_09

A note on Expected Points Performance: Winning teams will always outperform their Expected Points, as picking up all 3 points will usually be above expectation as no team dominate a game so much as to warrant a 100% win probability. The same goes for teams who consistently lose, as 0 points will usually be below expectation.

2016_update_100

Looking at time spent in Game States, we see how Gefle have spent just about 10% of the season in the lead so far. Helsingborg and Häcken have spent little time drawing while Sundsvall still have spent very little time trailing.

2016_update_11

Just with like the actual league table, there some big differences in the prediction compared to the last update, showing how difficult it can be to predict the league this early into the season. Mid-table has really opened up since last time, but the top 2 and bottom 3 remains the same.

Long-term trends and managerial changes

Usually, I would’ve ended the post here but as two managers have been sacked since the last update, I thought it would be interesting to see how AIK and Gefle have performed under Andreas Alm and Roger Sandberg respectively. I won’t comment on these plots more than that Alm likely had to leave because of politics and disputes at the club, while Sandberg was sacked due to Gefle’s poor results.

long_term_AIK

long_term_GEF

Another Allsvenskan 2016 update

Allsvenskan 2016 update

With three rounds of Allsvenskan games played since my last post, it’s time for an update. Like last time, I’m just going to throw a few visualizations at you together with my initial thoughts without going too much in-depth.

Starting out with the league table, we see just how close the league has been so far – with eight games played, only four points separate Östersund in 11th place from Malmö in 2nd.

allsv_update_01_01

We can also see some interesting streaks since last time, with Norrköping and Elfsborg winning all three games while Hammarby, Gefle and Falkenberg have been struggling. Looking at the early surprise teams we see that Sundsvall have continued to perform well while Jönköpings Södra have dropped in the table.

allsv_update_01_02Looking at shots we see how Hammarby, Kalmar and Norrköping have all moved in to the ‘busy attack, quit defence’ quadrant, indicating that they’ve played a bit better lately (or faced easier opposition!), while Sundsvall is still stuck in the ‘quiet attack, busy defence’ quadrant.

allsv_update_01_03While Malmö produces a lot of shots, they’re still one of the most ineffective sides up front. Göteborg and Norrköping on the other hand are enyoing some effective scoring at the moment.

allsv_update_01_04Sundsvall are still conceding a lot of shots, but at least they’re not converted into goals very often – which in part explains their good results so far. Elfsborg have moved into the ‘formidable’ defensive quadrant, only conceding one goal in the last three games.

allsv_update_01_05Looking at Expected Goals, Malmö are still the clearly best team, with Norrköping improving while Djurgården have dropped a bit. Here we really see the difference between the early surprise teams’ performance recently, as Sundsvall have improved both attacking and defensive numbers while Jönköpings Södra have done the opposite.

So how would the teams rank xG-wise? Expected Goals Difference should do well as measure of skill, and here we again see how the model ranks Malmö as the best side so far, with Norrköping and AIK the main contenders. A bottom three of Helsingborg, Gefle and Falkenberg have also emerged.

allsv_update_01_06

Another way of evaluating the teams’ performance so far is to simulate how many points on average each team would’ve received from their games. To do this I’ve used the shots from each game to simulate the result 10,000 times and the teams have then been awarded Expected Points based on the derived 1X2 probabilities.

For example, if the simulation would come up with probabilities of 0.5, 0.3 and 0.2 for each outcome then the home side would be awarded 0.5*3 + 0.3*1 or 1.8 Expected Points, while the away side would get 0.2*3 + 0.3*1, or 0.9 Expected Points.

Here’s a table of the team’s Expected Points so far:

allsv_update_01_07But a team can’t get 1.8 points from a game, only 0, 1 or 3 – so how have the teams performed compared to their Expected Points?

allsv_update_01_08Note: Malmö have been awarded a 3-0 win against Göteborg as the game was abandoned due to home fans throwing pyrotechnics towards a Malmö player. These points have been included.

Here we see how Helsingborg and Sundsvall have taken quite a lot more points than expected, while Falkenberg and Kalmar have done the opposite. This could be the result of some good/bad luck, but it can also mean that the model fail to properly assess the quality of these teams.

Let’s dig deeper and have a look at the Expected Points distribution of each team:

allsv_update_01_10Looking at these distributions we can see just how extreme the results have been for some of the teams so far. In fact, my model estimates that if we re-played Helsingborg’s games 10,000 times, they would get 13 points or more only about 5% of the time!

Lastly, here’s my updated prediction of the final Allsvenskan 2016 table:

allsv_update_01_09

That’s it for now. I hope to be back with another update when the league have gone on break for Euro 2016, and maybe I’ll look closer at individual players then.

 

Allsvenskan 2016 update

Allsvenskan 2016 so far

With 5 rounds of games played I thought it would be a good time to look at how the 2016 Allsvenskan is going. Let’s have a look at the league table so far:

2016_00

True to it’s rather unexpected nature, the opening five rounds of the 2016 Allsvenskan have seen some surprises, and I don’t think anyone expected Sundsvall and newly-promoted Jönköpings Södra to be at the top! Also, last year’s top team’s have been struggling a bit, but seem to have picked up the pace lately.

But nevermind the table – altough it never lies, it does give an unfair view of the teams’ underlying performances, especially with so few games played. To really have a look at how the team’s been coming along so far I’ve reproduced some of Ben Mayhew‘s beautiful scatterplots:

2016_01

Looking at shots taken and conceded per game we can see how Malmö, Djurgården and AIK have dominated their games so far, outshooting their opponent’s by some marginal. League leaders Sundsvall have, given their results, surprisingly spent most of their time in defence – but that’s just how Allsvenskan is.

 

2016_02

When looking closer at shooting effectiveness we see that surprise teams Sundsvall and Jönköpings Södra have been clinical in front of goal so far, partly explaining their results. Häcken on the other hand have really struggled to score.

 

2016_03

Looking at defensive effectiveness we can really see why Sundsvall are at the top of the Allsvenskan table. While spending a lot of time in defence, they’ve managed to concede very few goals given their shots faced. If this is down to some new tactic, skill or simply dumb luck remains to be seen – but for a team like Sundsvall I’m willing to say it’s the latter.

 

2016_04

Expected Goals-wise we see just how lucky Sundsvall have been so far. They’ve conceded a lot of xG while failing to produce up front, putting them in the same group as struggling sides Falkenberg, Häcken, Helsingborg and Gefle. Malmö is at the other side of the scale, producing a lot of high-quality chances while keeping a tight defence.

Another interesting thing to look at is time spent in Game States. As a result of their good performance (or luck!) so far, Sundsvall have only spent about 1% of minutes played losing so far while Gefle have only spent 10% in the lead!

2016_05

What about a prediction for the rest of the season then? I’ve used the games so far to fire up my league table simulation based on my Monte Carlo xG game simulation, and this is the result:

predict_2016_05

Note: The first table posted here was wrong due to a minor error in the code. This is the correct table.

As a Djurgården supporter, I kinda like the result – even though I think it’s a bit unrealistic for us to compete for silverware just yet. And anyways, a simulation of the whole season based on only 5 games tells more about what has happened so far than what we’ll see in the future, at least in my opinion.

The model clearly ranks Malmö as the best team in the league, as it’s done pretty much every season in my database, alongside AIK and Göteborg. Both the newly promoted teams, Östersund and Jönköpings Södra, seem competent xG-wise and have a good chance of staying up, while reigning champions Norrköping seem to be performing worse than last year. Gefle are always in the bottom of these kind of tables, but nevertheless seem to outsmart every metric available to avoid relegation season after season – but maybe this is the year they finally drop down to Superettan?

I’m planning to do these kind of updates at regular intervals, and maybe add some more plots and deeper analysis, but this will have to do for now!

Allsvenskan 2016 so far

A rough prediction of the new Allsvenskan season

Though I hadn’t planned on posting a prediction for the new Allsvenskan season until a couple of rounds had been played, after seeing Per Linde of fotbollssiffror posting his prediction on twitter and mentioning how he disagreed with it, I decided to do the same and fire up my league table prediction script from least season.

My Monte Carlo game prediction is designed to use at least a couple of rounds of data, so I was unsure how it would go about predicting a new season right from scratch, but I actually think it turned out better than expected:

predict_2016_01

There are some obvious problems though. First off, the script still thinks it’s 2015 and Jönköpings Södra and Östersund are playing in Superettan, causing some strange error where their every game is simulated as a 0-0 draw. This obviously skews the prediction for every team, but it isn’t really an error as the league simulation script isn’t designed to involve different leagues, and it’ll will be corrected when I update the database with the weekend’s results.

Also, every game is simulated with the teams’ squads as they were at the end of the 2015 season which is obviously a problem, with a lot of players coming and going since then – but again this will be fine once I update the database.

What about the actual prediction then? Besides the error with the promoted teams the only problem I have subjectively is the high percentages for Gefle’s relegation (they’ve been ruled out as long as I can remember but have still managed to stay up year after year) and Norrköping’s title defence. I’d also switch places between Djurgården and Häcken while placing Örebro somewhere in lower mid-table. The promoted sides are hard to predict, but I definitely place Östersund above Jönköpings Södra.

Though I’m pretty happy with the prediction, I’ll update it in another post once a couple of rounds have been played.

A rough prediction of the new Allsvenskan season

How important is the starting line-up when predicting games?

As I mentioned when doing the betting backtest for my Expected Goals model, my Monte Carlo game simulation is done on player level to account for missing players, which in theory would affect the game a lot. The simulation involves a very simple prediction of the starting line-up for each team in each game – but how would the backtest result look if I somehow could look into the future and actually know which players would be starting the game?

To test this I’ve simulated every game from the 2015 Allsvenskan season again, using my second model with more heavily weighted home field advantage – but this time used the actual line-ups instead of having the model guess. For the backtest I’ve again used odds from Pinnacle and Matchbook, but won’t bore you with the results from both as they’re much the same. Here’s the model’s results betting at Matchbook:

lineup_01lineup_02

As expected, knowing the correct line-up really boosts the model’s predictions, as it now makes a profit pretty much across the board. Just like with the previous backtests, the 1X2 market looks ridiculously profitable as the model is very good at finding value in underdogs.

Let’s compare the results with that from Model 2:

lineup_03

The numbers in this table represent the net difference in results for the two models. In general, Model 3 makes fewer bets at lower odds, but has a much higher win percentage – hence the bigger profit. Remember, the only difference between these models is that Model 3 uses the actual line-up for each game, while Model 2 have to guess.

So could these results be used to develop a betting strategy? Using the actual line-ups for the simulation, the opening odds are of course not available to bet on since they are often posted a week or so before each game while the line-ups are released only an hour before kick-off. But as the game simulation only takes about a minute per game, it’s certainly possible to wait for the line-ups to be released before doing the simulation and then bet whatever the model deem as value.

How important is the starting line-up when predicting games?

Putting the model to the test: Game simulation and Expected Goals vs. the betting market

With the regular Allsvenskan season and qualification play-off both being over months ago, instead of doing a season summary (fotbollssiffor and stryktipset i sista stund have already done that perfectly fine), I thought I’d see how my model has been performing on the betting market this season. Since my interest in football analytics comes mainly from its use in betting, this is the best test of a model for me. Though I usually don’t bet on Allsvenskan, if the model can beat the market, I’m interested.

Game simulation

To do this, I should first say a few things about how I simulate games. I want my simulations to resemble whatever they are supposed to model as much as possible, and because of this I’ve chosen not to use a poisson regression model or anything remotely like that. Instead I’ve build my own Monte Carlo game simulation in order to emulate a real football game as close as possible.

I won’t go into any details about exactly how the simulations is done, but the main steps include:

  • Weighting the data for both sides to account for home field advantage.
  • Predict starting lineups for each team using their most recent lineup, minutes played and known unavailable players.
  • Simulate a number of shots for each player, based on his shots numbers and the attacking and defensive characteristics of both teams.
  • Simulate an xG value for each shot, based on the player’s xG numbers and attacking/defensive characteristics of both teams.
  • Given these xG values, the outcome of the shot is then simulated and any goals are recorded.

Each game is simulated 10,000 times, obviously based only on data available prior to that particular game.

The biggest advantage of this approach is that it’s easy to account for missing players, it is in fact done automatically. It also seems more straightforward and easily understood than other methods, at least to me. Another big plus is that it’s fairly easy to modify the Monte Carlo algorithm in order to try new things and incorporate different data. The drawbacks include the time it takes to simulate each game. At 10,000 simulations per game it takes about a minute, meaning that simulating a full 240-game Allsvenskan season would take at least 4 hours. Also, since my simulations rely heavily on up-to-date squad info, such a database have to be maintained but this can be automated if you know were to look for the data.

For each game, the end results of all these simulations is a set of probabilites for each possible (and impossible!?) result, which can then be used to calculate win percentages and fair odds for any bet on the 1X2, Asian Handicap and Over/Under markets.

As an example of how the end result of the simulation looks, I’ve simulated a fictive Stockholm Twin Derby game, Djurgården vs. AIK. Here’s how my model would predict this game if it were to be played today (using last season’s squads, I haven’t accounted for new signings and players leaving yet):

game_sim_01

Given these numbers the fair odds for the 1X2 market would be about 2.31-3.62-3.44 while the Asian Handicap would be set at Djurgården -0.25 with fair odds at about 1.99-2.01 for the home and away sides respectively. The total would be set at 2.25 goals, with fair odds for Over/Under at about 2.04-1.96.

Backtesting against the market

With my odds history database containing odds from over 50 bookmakers and the fact that timing and exploiting odds movements is a big part of a successful betting strategy, it’s not a simple task to backtest a model over a full season properly. I’ve however tried to make it as easy as possible and set out some rules for the backtesting:

  • The backtest is based on 1X2, Asian Handicap and Over/Under markets.
  • Only odds from leading bookmaker Pinnacle and betting exchange Matchbook is used. Maybe I’ll run the backtest against every available bookmaker in order to find out which is best/worst at setting its lines for a later post.
  • Two variations of the Monte Carlo match simulation is tested, where Model 2 weights home field advantage more heavily.
  • Only opening and closing odds are used in an attempt at simulating a simple, repeatable betting strategy.
  • For simplicity, the stake of each bet is 1 unit.
  • Since my model seems to disagree quite strongly with the bookies on almost every single game, there seems to exist high-value bets suspiciously often. To get the number of bets down to a plausible level, I’ve applied a minimum Expected Value threshold of 0.5. As EV this high is usually only seen in big underdogs, this may be an indicator that my model is good at finding these kind of bets, or that it is completely useless.

So lets’s take a look the results of the backtest – first off we have the bookmaker Pinnacle. Here’s the results plotted over time:

Pinnaclebacktest_01

backtest_02

 

backtest_03

We can immediately see from the results table that the model indeed focuses on underdogs and higher odds. Set against Pinnacle, both variations of the model seems to be profitable on the 1X2 market, with Model 2 (with more weight on home field advantage) performing better with a massive 1.448 ROI.

Both models recorded a loss on the Asian Handicap market and only Model 1 made a profit in the Over/Unders – a disappointment as these are the markets I mostly bet on.

The table above contains bets on both opening and closing odds – let’s seperate the two and see what we can learn:

backtest_04

Looking at these numbers we see that both models perform slightly better against the closing odds on the 1X2 market, while Model 2 actually made a tiny profit against the closing AH odds. We can also see that Model 1’s profit on Over/Unders came mostly from opening odds.

But what about the different outcomes to bet on? Let’s complicate things further:

backtest_05

So what can we learn from this ridiculous table? Well, the profit in the 1X2 market comes mainly from betting away teams which suits the notion that the model is good at picking out highly underestimated underdogs. Contrary to the 1X2 market, betting home sides on the Asian Handicap markets seems more profitable than away sides. Lastly the model has been more profitable betting overs than unders.

As we’ve seen, my model seems to be good at finding underdogs which are underestimated, and that at Pinnacle, this bias mostly exist in the 1X2 market, hence the huge profit.

Matchbook

But what about the betting exchange Matchbook, where you actually bet directly against other gamblers?backtest_06backtest_07backtest_08

The 1X2 market seems to be highly profitable at Matchbook too, and Model 1 actually made a nice profit on AH, especially away sides – in contrast to the results at Pinnacle. Also, the mean odds here are centered around even money. Over/Unders again seems to be a lost cause for my model.

Conclusion

As I’ve mentioned, the model seems best at finding underdogs and high odds which are just too highly priced, and looking at the time plots we can see that these bets occur mostly in the opening months of the season. This may be and indicator of how the market after some time adjusts to surprise teams like this season’s Norrköping.

For a deeper analysis of the backtest I could have looked at how results differed for minus vs. plus handicaps on the AH market, and high vs. low O/U lines. Using different minimum EV thresholds would certainly change things and different staking plans like Kelly could also have been included, but I left it all out as to not overcomplicate things too much.

I feel I should emphasize that the different conclusions made concerning betting strategy from this backtest only applies to my model, and not Allsvenskan or football betting in general.

As we’ve seen, an Expected Goals model and Monte Carlo match simulation can indeed be used to profit on Allsvenskan. However, the result of any betting strategy depend highly on not only the model, but also when, where and what you bet on.

Putting the model to the test: Game simulation and Expected Goals vs. the betting market

Preview: Allsvenskan Qualification Play-off

With the regular Swedish season being over and Norrköping crowned champions, all that’s left now is to decide who’ll get the last spot in next years Allsvenskan. In this qualification play-off, Sirius finishing 3rd in Superettan is pitted against Allsvenskan’s 14th placed Falkenberg in a two-game battle.

Let’s have a look at some stats for the teams, compared to both the teams in Allsvenskan (blue) and Superettan (red):

play_off_01

From this graph, Sirius actually look really good with especially a strong defensive, even when compared to the Allsvenskan teams, while Falkenberg’s defence looks really poor. However, this doesn’t say much about how the teams compare to each other since Falkenberg has had to face far tougher opponents in Allsvenskan.

play_off_02 play_off_03

Looking at the xG maps what again stands out is the defensive performances of the teams. While Falkenberg have conceded a massive 415 shots, almost 14 per game, Sirius have only conceded 241 shots or about 8 per game. Not only that, Sirius’ xG per conceded attempt is 0.111 while Falkenberg’s is a staggering 0.154, meaning they concede shots in quite bad (for them) situations – not a good thing.
play_off_04Looking at individual players we can se how Sirius’ Stefan Silva is the big overperformer here with his 12 goals almost doubling his xG numbers. Also, Falkenberg seem to have more goalscoring options with three players over 6 goals while Sirius only have Silva.

As always, I’m not willing to present any prediction for individual games, but here I had hoped to show the results of a simulation covering both play-off games including possible extra time and penalty shoot-out. I have run such an simulation, however I’m not happy with the results as my model seems to be favouring Sirius too heavily. This is almost certainly due to the different leagues involved, making Sirius look way better than they would be against an Allsvenskan side. Since I only came up with writing this post this morning, I haven’t had the time to look into a possible league strength variable to use in the simulation.

But if I had to guess, I’d say that Sirius looks like a real strong side and should possibly be considered favourites for promototion here, mostly due to Falkenberg’s nasty habit of conceding a lot of shots with high goal expectancies.

Preview: Allsvenskan Qualification Play-off

Preview: AIK vs. IFK Göteborg

Monday night AIK will host IFK Göteborg for an extremely important game in the race for the Allsvenskan title. Both teams are close behind Norrköping in the lead and will surely go for the win here to challenge for the title, and I thought it would be a good idea to have a look at some team stats as a preview to this crucial game.

The plot below contains goals, shots, Expected Goals, xG per attempt, goal conversion % and shot on target % – both for and against, normalized per game where necessary. Home and away stats for each team in the league are separated with home in blue and away in red. For each subplot the lower right corner is preferable, with high offensive and low defensive numbers.

aik_gbg_03

Besides SoT%, both AIK and Göteborg appear to be among the best in the league in each stat, which partly explain why they are fighting for the title. What is really striking though, and could be seen as a indicator of team style, is that while AIK’s offensive numbers at home are really good, Göteborg’s strength when playing away is their defence.

aik_gbg_01 aik_gbg_02

This is also evident from each teams xG maps, where it is clear that AIK’s main strength is their attacking power and ability to produce high volumes of shots with high xG values each game. Göteborg on the other hand rely heavily on their defensive skills to protect their box and limit the opposition’s scoring chances. This clash of styles adds yet another interesting flavor to an already interesting game.

aik_gbg_04Looking at each teams top 5 goalscorers it is clear that AIK’s impressive attack rely heavily on Henok Goitom. His 16 goals this season are pretty much in line with his xG of about 15 while Göteborgs Søren Rieks seems to be overperforming with his 10 goals equalling almost two times his xG numbers. Both teams have sold one of their best offensive players with Bahoui and Vibe both making a move abroad this summer.

What about a prediction then? While I won’t reveal any percentages for this (or any) game, what I can say is that my model is pretty much in tune with the betting market. AIK is a slight favourite due to their home advantage, but this is really anybody’s game and it will hopefully be highly entertaining.

Preview: AIK vs. IFK Göteborg

Predicting the final Allsvenskan table

With the Swedish season soon coming to an end it’s a good time to try out how the Expected Goals model will predict the final table. With only three games left a top trio consisting of this season’s big surprise Norrköping just in front of Göteborg and AIK are competing for the title as Swedish Champion. At the opposite end of the table Åtvidaberg, Halmstad and Falkenberg look pretty stuck, with the two latter teams battling it out for the possible salvation of the 14th place relegation play-off spot.

predict_table_01

Let’s take look at the remaining schedule for the top three teams:

Norrköping have two though away games left against Elfsborg and Malmö, who are both locked in a duel for the 4th place which could potentially mean a place in the Europa League qualification. Elfsborg are probably the tougher opponent here, with reigning champions Malmö busy in the Champions League group stage. Between these two away games Norrköping will play at home against Halmstad who are fighting for survival in the bottom of the table.

Göteborg have two though away games themselves, first off at Djurgården and later a very important game against fellow title contenders AIK. This game will probably decide which of the two will challenge Norrköping for the title in the last round. Göteborg finishes the season at home to Kalmar who could possibly play for their survival in this last game.

AIK have the best remaining schedule of the three top teams, with away games at Halmstad and Örebro on either side of the crucial home game against Göteborg. As mentioned, Halmstad is fighting for their existence in Allsvenskan, while Örebro’s recent great form have seen them through to a safe spot in the table.

At this late stage of the season there are a lot of psychological factors in play, with the motivation and spirit of teams and players often being connected to their position in the table. These aspects are very hard to quantify and have not been incorporated in my model. So my prediction of the table rely solely on my Expected Goals model used in Monte Carlo simulation. I won’t reveal exactly how I simulate games but the subject will probably be touched upon in a later post so I’ll spare you any boring technical details for now.

Each of the remaining 24 individual games have been simulated 10,000 times. For each of these fictional seasons I’ve counted up the points, goals scored and goal differences for every team to come up with a final table for that season. Lastly I’ve combined all these seasons into a table with expected points and probabilities of each teams possible league positions.

predict_table_03

The model clearly ranks Norrköping as the most likely winner with Göteborg as the main contender, while AIK’s chances of winning the title is only at about 18%. The bottom three looks rather fixed in their current positions with Falkenberg having only a 2% chance of overtaking Kalmar in the last safe spot in the table. At mid-table things are still quite open, even though Djurgården’s season is pretty much over with a 89% chance of placing 6th. Malmö seem to have an advantage against Elfsborg in the race for the 4th place, but given their Champions League schedule their chances should probably be less than the model predicts.

I’ll probably be posting updated predictions on my twitter feed after each of the top teams remaining games to see how the results change the predictions.

Predicting the final Allsvenskan table

The Model part 3 – Expected Goals for Swedish Allsvenskan

Now that we’ve explored the Expected Goals concept and the data available for Swedish football, it’s finally time to build the model and put it to the test.

Setting up an Expected Goals model can be done in a number of ways, for example with the help of exponential decay, machine learning or some kind of regression model. I’ve chosen to use a logistic regression model because I think it has several advantages. Logistic regression is mostly used when the dependent variable only has two possible values, which translates well to football since a shot can end up either as a goal (1) or no goal (0). Also, logistic regression is used to return a calculated probability –  i.e. our xG value. It’s also very easy to set up a logistic regression model and tinker with different variables using python’s statsmodels library.

First off, the dataset needs to be divided into two parts: one for training or constructing the model and one for testing it. This is done in order to avoid overfitting where the same data is used for constructing the model as for evaluating it, which would make the model possibly look better than it is.

I’ve chosen a number of variables which all in some way make sense to test in the model. They include:

  • League: Could goal expectancy differ between Allsvenskan and Superettan? Since this variable isn’t numerical, it’s been recoded to either 0 (Allsvenskan) or 1 (Superettan).
  • Attempt type: That the goal expectancy for regular shots and penalties are completely different from each other is obvious to anyone interested in football. This variable has also been recoded to either 0 (shot) or 1 (penalty).
  • Distance to the center of the goal: It is probably easier to score the closer to goal the shot is taken.
  • Angle: The distance to goal doesn’t tell the whole story of the importance of shot location as shots taken from the same distance but at different angles at least should have different expectancies. Higher angle means a more central position, which would probably be easier to score from.
  • Game State: There’s been some work done on the importance of game state in football, and it’s use can be debated, but I’m at least going to try it out. It works by crediting teams who spend time having the lead. Teams start every game level at Game State 0. Going 1-0 up means a Game State of +1 while the trailing team’s Game State drops to -1, and so on.
  • Number of players on the pitch. I think this is a first for using number of players in Expected Goals models, at least I haven’t seen anybody use it before. I’ve decided to call it Man Strength in lack of a better term and it works much like Game State. If an opponent is sent off your Man Strength goes to +1, while it drops to -1 for the opposing side. The reasoning behind using a variable like this is that as you face fewer opponents the defensive pressure could be less than usual, resulting in a higher goal expectancy.

model_01

Let’s take look at the individual goal expectancy for the variables. Goal expectancy for the two leagues is very similar but could possibly be of use if they interact with the other variables differently. Attempt type is pretty obvious with penalties having higher value than regular shots. In the graphs showing distance and angle the values have been rounded off for presentation, while higher precision is used in the model. There is some outliers here due to small sample size at the higher values but the patterns seems clear. It’s hard to tell from the graph if Game State is of any use since there isn’t much difference between the levels. But Man Strength shows a clear pattern, it certainly looks like goal expectancy rises when having more players on the pitch.

So let’s throw the training dataset (seasons 2011-2014) into a logistic model and have a look at a summary of the results:

model_02

There’s a lot of numbers here but let’s just focus on the p-values for each variable. Every variable is significant at the 95% significance level (p<0.05) except league. As expected from the plot above, there’s apperently no use to separate Allsvenskan and Superettan shots. Here’s how the model summary looks without the league variable:

model_03

So, with only significant variables left in the model, how does it perform when compared to actual goals? I’ve had the model calculate total xG for each player in Allsvenskan and Superettan for our test season (2015), and plotted this against their actual goals scored the same season.

model_04

With an r-squared of 0.77 I’d say the model is performing pretty well. Whats more encouraging is that the slope of the fitted line seems to be very close to 1, meaning that 1 expected goal is pretty much equal to 1 actual goal scored.
model_05In the graph I’ve also plotted the players in the top 10 in either goals scored, xG, goals per 90 or xG per 90 for the season. Some of them have good numbers in several of the stats. Emir Kujovic and Henok Goitom for example are performing outstanding this season, both being crucial to their respective teams run at the title. Markus Rosenberg on the other hand is underperforming with only 9 goals scored compared to his 16 expected goals, which is one of the reasons why Malmö are not living up to the expectations this season. Örebro’s Broberg and Häcken’s Paulinho de Oliveira also make the list due to their great form in the recent months while Djurgården’s Mushekwi enjoyed a good goalscoring run in the first half of the season.

Let’s take a look at how the model perform on a team level:

model_06 model_07

On a team level, it looks like the model is performing better when it comes to xG against than for, but overall it is a reasonably good fit, although not as good as at player level.
model_08As we can see the top teams are all performing well offensively. Göteborg stand out defensively with only 17 goals against in 27 games, even outperforming their excellent xG against at about 24. On the other end of the scale, Halmstad’s attack is underperforming with only 18 goals compared to over 35 xG.

model_09

That’s it for now when it comes to building my Expected Goals model for Swedish football, but I will probably bring it up again if I make any improvements and just maybe I’ll show how it’s been performing on the betting market. In my next post I’ll see how my model predicts the final table. Who will it pick as champion?

The Model part 3 – Expected Goals for Swedish Allsvenskan