Allsvenskan and Superettan player update

With Allsvenskan back in action after the summer break and the transfer window approaching, I thought it would be interesting to take a look at some individual player stats. Some players have already left the league for new challenges abroad, and when the window opens on July 15th we’ll likely see some moves between Swedish clubs as well. Given the limitations of my database, this post will focus on attacking players as the most advanced data available is shot coordinates.

I usually don’t write about Sweden’s second tier Superettan, simply because I don’t follow it at all, but as there are several players who could move to Allsvenskan and even abroad, I’ve also looked at players from this league.

So let’s start with taking a look at which players have produced the highest number of goals, assists and Expected Goals so far this season. I’ve only included players who have played 1/3 of the league so far (390 and 450 mins for Allsvenskan and Superettan, respectively) and also looked closer at younger players, splitting them into two groups: players aged 20-23 and players aged up to 19.

Allsvenskanplayers_01

players_02

Player profiles

There are some players in the plots above worth looking closer at. Let’s start out with last year’s top scorer, Emir Kujovic:players_03Kujovic has just recently signed with Belgian side Gent, leaving reigning champions Norrköping after a couple of highly productive seasons. His goal and xG output this season are on par with last season’s but he’s also doubled his assists per 90 minutes, having been involved in almost a goal per game this season. I don’t know much about Belgian football, but given the right kind of attacking style he may very well score some goals over there.

players_04Häcken’s Paulo De Oliveira, or Paulinho as he’s called, leads the league in assists+goals per 90 with his impressing 1.13 made up of just goals. Having overperformed against xG since his return to Swedish football, Paulinho may very well be the best finisher in the league.

players_05With Malmö struggling to capitalise on their xG last season, Vidar Kjartansson has grown into a very good signing for them. I haven’t seen him played that much, but given his shot profile he looks just like the strong center forward scoring from mostly inside the box they needed to gradually phase out the aging Markus Rosenberg.

players_06Struggling to take a regular spot at Malmö for the last couple of seasons, Pawel Cibicki’s move to newcomers Jönköpings Södra has worked out well for him. With more playing time, he has continued to improve his goal output and given his age he could be on his way to a bigger club abroad soon.

players_07One who has already taken the next step is AIK’s loan Carlos Strandberg, who after struggling at Russian CSKA Moscow returned to Sweden this season. The young and forceful striker continued where he picked up and has been crucial for his club this season. From his shot profile we can see that he favours shooting from the left, mostly due to his powerful left foot. Set to return to Russia soon, Strandberg will make his last game against Malmö this weekend.

players_08Leading the youngest players in xG per 90 is another AIK striker, 16 year old Alexander Isak, who has impressed so far and is supposedly targetted by a number of big European clubs. Yeah, the sample size is small but given his very young age he’s done well and should he continue to improve he could grown into a class player.

 

Superettan

As I’ve said, I very rarely watch Superettan so I know next to nothing about most players in the league. Here I’ve just picked out a few interesting players from the ranking below to look closer at, and I’ll leave the shot profiles uncommented.

players_09

players_10

players_11

players_12

players_13

players_14

players_15

That’s it for now, but if you want to see shot profiles from any other player in these leagues, just hit me up on Twitter.

Allsvenskan and Superettan player update

Preview: Allsvenskan Qualification Play-off

With the regular Swedish season being over and Norrköping crowned champions, all that’s left now is to decide who’ll get the last spot in next years Allsvenskan. In this qualification play-off, Sirius finishing 3rd in Superettan is pitted against Allsvenskan’s 14th placed Falkenberg in a two-game battle.

Let’s have a look at some stats for the teams, compared to both the teams in Allsvenskan (blue) and Superettan (red):

play_off_01

From this graph, Sirius actually look really good with especially a strong defensive, even when compared to the Allsvenskan teams, while Falkenberg’s defence looks really poor. However, this doesn’t say much about how the teams compare to each other since Falkenberg has had to face far tougher opponents in Allsvenskan.

play_off_02 play_off_03

Looking at the xG maps what again stands out is the defensive performances of the teams. While Falkenberg have conceded a massive 415 shots, almost 14 per game, Sirius have only conceded 241 shots or about 8 per game. Not only that, Sirius’ xG per conceded attempt is 0.111 while Falkenberg’s is a staggering 0.154, meaning they concede shots in quite bad (for them) situations – not a good thing.
play_off_04Looking at individual players we can se how Sirius’ Stefan Silva is the big overperformer here with his 12 goals almost doubling his xG numbers. Also, Falkenberg seem to have more goalscoring options with three players over 6 goals while Sirius only have Silva.

As always, I’m not willing to present any prediction for individual games, but here I had hoped to show the results of a simulation covering both play-off games including possible extra time and penalty shoot-out. I have run such an simulation, however I’m not happy with the results as my model seems to be favouring Sirius too heavily. This is almost certainly due to the different leagues involved, making Sirius look way better than they would be against an Allsvenskan side. Since I only came up with writing this post this morning, I haven’t had the time to look into a possible league strength variable to use in the simulation.

But if I had to guess, I’d say that Sirius looks like a real strong side and should possibly be considered favourites for promototion here, mostly due to Falkenberg’s nasty habit of conceding a lot of shots with high goal expectancies.

Preview: Allsvenskan Qualification Play-off

The Model part 3 – Expected Goals for Swedish Allsvenskan

Now that we’ve explored the Expected Goals concept and the data available for Swedish football, it’s finally time to build the model and put it to the test.

Setting up an Expected Goals model can be done in a number of ways, for example with the help of exponential decay, machine learning or some kind of regression model. I’ve chosen to use a logistic regression model because I think it has several advantages. Logistic regression is mostly used when the dependent variable only has two possible values, which translates well to football since a shot can end up either as a goal (1) or no goal (0). Also, logistic regression is used to return a calculated probability –  i.e. our xG value. It’s also very easy to set up a logistic regression model and tinker with different variables using python’s statsmodels library.

First off, the dataset needs to be divided into two parts: one for training or constructing the model and one for testing it. This is done in order to avoid overfitting where the same data is used for constructing the model as for evaluating it, which would make the model possibly look better than it is.

I’ve chosen a number of variables which all in some way make sense to test in the model. They include:

  • League: Could goal expectancy differ between Allsvenskan and Superettan? Since this variable isn’t numerical, it’s been recoded to either 0 (Allsvenskan) or 1 (Superettan).
  • Attempt type: That the goal expectancy for regular shots and penalties are completely different from each other is obvious to anyone interested in football. This variable has also been recoded to either 0 (shot) or 1 (penalty).
  • Distance to the center of the goal: It is probably easier to score the closer to goal the shot is taken.
  • Angle: The distance to goal doesn’t tell the whole story of the importance of shot location as shots taken from the same distance but at different angles at least should have different expectancies. Higher angle means a more central position, which would probably be easier to score from.
  • Game State: There’s been some work done on the importance of game state in football, and it’s use can be debated, but I’m at least going to try it out. It works by crediting teams who spend time having the lead. Teams start every game level at Game State 0. Going 1-0 up means a Game State of +1 while the trailing team’s Game State drops to -1, and so on.
  • Number of players on the pitch. I think this is a first for using number of players in Expected Goals models, at least I haven’t seen anybody use it before. I’ve decided to call it Man Strength in lack of a better term and it works much like Game State. If an opponent is sent off your Man Strength goes to +1, while it drops to -1 for the opposing side. The reasoning behind using a variable like this is that as you face fewer opponents the defensive pressure could be less than usual, resulting in a higher goal expectancy.

model_01

Let’s take look at the individual goal expectancy for the variables. Goal expectancy for the two leagues is very similar but could possibly be of use if they interact with the other variables differently. Attempt type is pretty obvious with penalties having higher value than regular shots. In the graphs showing distance and angle the values have been rounded off for presentation, while higher precision is used in the model. There is some outliers here due to small sample size at the higher values but the patterns seems clear. It’s hard to tell from the graph if Game State is of any use since there isn’t much difference between the levels. But Man Strength shows a clear pattern, it certainly looks like goal expectancy rises when having more players on the pitch.

So let’s throw the training dataset (seasons 2011-2014) into a logistic model and have a look at a summary of the results:

model_02

There’s a lot of numbers here but let’s just focus on the p-values for each variable. Every variable is significant at the 95% significance level (p<0.05) except league. As expected from the plot above, there’s apperently no use to separate Allsvenskan and Superettan shots. Here’s how the model summary looks without the league variable:

model_03

So, with only significant variables left in the model, how does it perform when compared to actual goals? I’ve had the model calculate total xG for each player in Allsvenskan and Superettan for our test season (2015), and plotted this against their actual goals scored the same season.

model_04

With an r-squared of 0.77 I’d say the model is performing pretty well. Whats more encouraging is that the slope of the fitted line seems to be very close to 1, meaning that 1 expected goal is pretty much equal to 1 actual goal scored.
model_05In the graph I’ve also plotted the players in the top 10 in either goals scored, xG, goals per 90 or xG per 90 for the season. Some of them have good numbers in several of the stats. Emir Kujovic and Henok Goitom for example are performing outstanding this season, both being crucial to their respective teams run at the title. Markus Rosenberg on the other hand is underperforming with only 9 goals scored compared to his 16 expected goals, which is one of the reasons why Malmö are not living up to the expectations this season. Örebro’s Broberg and Häcken’s Paulinho de Oliveira also make the list due to their great form in the recent months while Djurgården’s Mushekwi enjoyed a good goalscoring run in the first half of the season.

Let’s take a look at how the model perform on a team level:

model_06 model_07

On a team level, it looks like the model is performing better when it comes to xG against than for, but overall it is a reasonably good fit, although not as good as at player level.
model_08As we can see the top teams are all performing well offensively. Göteborg stand out defensively with only 17 goals against in 27 games, even outperforming their excellent xG against at about 24. On the other end of the scale, Halmstad’s attack is underperforming with only 18 goals compared to over 35 xG.

model_09

That’s it for now when it comes to building my Expected Goals model for Swedish football, but I will probably bring it up again if I make any improvements and just maybe I’ll show how it’s been performing on the betting market. In my next post I’ll see how my model predicts the final table. Who will it pick as champion?

The Model part 3 – Expected Goals for Swedish Allsvenskan

The Model part 2 – The Data

In my last post I discussed the concept of Expected Goals and how its probabilistic nature opens up for simulations. Today I’m going to talk about another cornerstone when building my model – the data. I do this because I think it’s important to fully explore the data when building a model, to understand its strengths and weaknesses, its advantages and limitations and how these affect the model and its output and performance. No model is perfect, but if we’re aware of its biases and limitations we can still make good use of it.

While Opta produces very advanced data covering every on ball event in the bigger leagues, the data available for Swedish football is lesser in terms of detail, quality and reliability. What’s available for use is pretty much just shots, and there is no distinction between different types of shots besides penalties. Only shots that ended up as goals have detailed information on whether it was headed, came from a set piece and so on. Using this information would result in a skewed model, rating for example headers too high since every existing header is also a goal. I’ve therefore treated all these types of situations as regular shots. Furthermore the location of the shots is recorded with less accuracy than Opta’s. The x and y coordinates are recorded with only integers, making them less precise and the location of the shots is sometimes plain wrong. I regularly examine the shot maps of games I’ve watched live and there always seems to be some errors, but I’m hoping these will be insignificant. There’s no information on passes, defensive actions or anything like that, the only events recorded besides shots is fouls, corners, offsides, substitutions and cards.

Data exists for the top league Allsvenskan, but also second tier Superettan and the two Division 1 leagues below it, from season 2011 and onwards. However, the data from Division 1 seems to be of too poor quality for modelling and substitutions were not recorded properly until season 2013, so per90 stats from seasons 2011 and 2012 are pretty much useless. Anyway, here’s a shot map of every shot recorded for Allsvenskan and Superettan from season 2011 up till now.

data_01

With so many shots taken from the exact same locations, it’s probably easier to get a sense of the distribution of the shots through a hexbin plot, showing what could be described as the shot density of every location on the pitch:

data_02

As we can see, the penalty box and the area just in front of it seems to be the most frequent shooting locations, which makes sense. Also, the penalty spot stands out with so many shots taken from the exact same location.

data_03

Looking at only goals, the penalty spot again stands out but we can also see that most goals are scored inside the box, especially from more central locations. This again makes sense.

It’s also a good idea to take a look at the general characteristics of the games you want to model, so I’ve created some histograms of goal and shot distributions from Allsvenskan.

data_04

Examening these, we can see that an average game ends up with a total of 2.74 goals, with the home side having a 0.433 goal advantage. What about shots?

data_05

As expected, the home side also enjoy an advantage when it comes to shots, about 2.481 on average, while the average total number of shots in an Allsvenskan game is 21.931.

I think we have a good sense of the league and games we want to model now, so I’ll end this post here. Next up I’ll get down to business, building the model and putting it to the test.

The Model part 2 – The Data