In a series of posts I will be covering the work done on and with my model on Swedish football. In this first part of the series I’ll talk about the underlying concept upon which the model is built – Expected Goals.
We’ve all seen those games were the result ended up being extremely unfair given how the game played out. Maybe the dominant team had a spell of bad luck and conceded an own goal while missing their clear chances, or the opposing goalkeeper played the game of his career making some huge saves, or maybe the lesser side luckily managed to score through their only real chance. All these scenarios point to the same thing – there’s a lot of randomness associated with goals. We often see teams playing great and still lose while a poorly playing side take home all three points.
Because of this random nature of football, only looking at results and goals scored and conceded is not a good way to assess true team and player strength. Sure, good teams usually win but they also sometimes run into spells of bad form and perform worse, while bad team sometimes goes on a good run, securing that last safe spot in the table just in time before the season ends.
To combat this problem, the football analytics community has turned its eyes to a more stable part of the game – shots – in hope that these will exhibit less randomness and hold more explaining power. While it is certainly true that examining how many shots a team produce and concede can tell you more than just goals, the same problem with randomness exists here too. Good teams usually take more shots than they concede but as we all know, this is not always the case.
Expected Goals aims at getting down to the core of why good teams perform well and bad teams perform worse, and in the process avoid some of the problems associated with just summing up goals and shots. It is based on the notion that good teams takes more shots in good situations while bad teams do the opposite. The same is true in defence, as good teams avoid conceding more shots in good situations than bad ones. The hope is that these characteristics will be less random and more useful in explaining and predicting football.
In its essence, Expected Goals gives you a value of how often a typical shot ended up in the net, and this is done by examining huge datasets in a number of different ways. Usually an Expected Goals model is based on where on the pitch the shot was taken and the reason for this is quite clear once you come to think about it – it all comes down to shot quality. Imagine two different scoring opportunities, the first being 25 meters out from the goal and the other being 5 meters from the goal. In traditional football reporting these two shots will be treated just the same, but we all know that the latter is preferable since it is closer to goal and probably an easier shot to make.
Given the different methods, ideas and datasets football analysts work with, there’s no right way to calculate an Expected Goals or xG value. For example, an ambitious analyst might account for not only where the shot was taken, but also what type of shot it was, what kind of pass preceded the shot, if the player dribbled before taking the shot etc. The possibilities are only limited by the data, and with the likes of Opta covering the top European leagues, these are vast.
Let’s take a look at real example. In my database (more on that in later a post) I have 243 penalties recorded, of which 192 ended up in goal. To get the xG value for a penalty we just need to calculate the fraction of penalties which turned into goals, in this case 192/243, or about 0.79. In comparison, the xG value for a shot taken from the same penalty spot during regular play is estimated by my model to be about 0.25, which makes sense since it’s a harder shot than the penalty.
As shown by several football analysts (for example on the blog 11tegen11), Expected Goals hold some real power at explaining football results. But it also has its weaknesses. There’s currently no way of accounting for the position of the defenders when the shot was taken, which surely would effect scoring expectation. Furthermore Expected Goals only deals with actual shots taken but as we all know, not all scoring chances produces a shot. It’s also true that xG values are averages, meaning that there’s actually a whole range of different expectations for different players. Surely Leo Messi will have a higher chance to score than Carlton Cole in nearly every situation.
To me, the real strength of Expected Goals lie in that we can treat it as a probability and use it in simulations in order to examine more complex situations. Take a look at the penalty for example. With an xG value of 0.79, we can expect an average player to score most of the times, but he’ll also miss some shots. In fact, it’s not uncommon for him to miss several shots in a row. With the help of Monte Carlo simulation (again, more on that in later posts), we can examine the nature of the penalty shot more closely. Let’s say we get our player to take 10,000 penalty shots in a row. How many will he make?
As we can see our player started out by making his first shot only to quickly drop below the expected 79% scoring rate, but as he took more and more shots he slowly moved towards his expected scoring rate. He actually scored 7928 penalties which is very close to the expected 7900.
Let’s try a more complex simulation just for fun. Imagine a penalty shoot-out. How likely is it to make all five shots? Four out of five? My database doesn’t contain any penalty shoot-outs but my guess is that these are converted on a slightly lesser scale than regular penalties, either due to the stress involved or maybe fatigue. But let’s use our standard xG value of 0.79 for simplicity. Let’s simulate 10,000 shoot-outs with five penalties each.
Given the conditions we’ve set up, it seems there’s about a 30% chance to score all five penalties while making four is the most likely outcome. Going goalless from this shoot-out looks rather unlikely but as I’ve said the true chance of scoring after playing 120 minutes and with the hopes of thousands (or millions) of people on your shoulders is probably lower so a goalless shoot-out is probably more likely than our simulation shows.
That’s it about Expected Goals for now, and as I’ve said we will explore the possibilities of Monte Carlo simulation more thoroughly later. In my next post about my model I’ll talk about the data used for building it.