Stake sizing, Part 1

In the last post we used Python code to take a look at a classic gambling situation, the coin flip, to make a point about the importance of choosing the highest odds available to bet at. Today, we’ll again use the coin flipping example to investigate another fundamental principal of successful gambling: stake sizing.

Now, imagine we’re one of the lucky punters from the last post who were allowed to bet on a fair coin flip at odds of 2.03. As I stated then, this is pretty much like a license to print money – but how much of your bankroll should you bet on each flip of the coin? Knowing that the coin was indeed fair and you would be getting the best of it, a natural instinct could be to bet as much as you could possibly cough up, steal and borrow in order to maximize your profit. This is a poor strategy though, as we’ll soon come to see.

The reason for this is that even if we do have come across a profitable proposition, our edge when betting at a (I’ll empasize it again: fair) coin flip at 2.03 odds is only 1.5% – meaning that for each 1 unit bet we are expected to net 0.015 units on average. This conclusion should be absolute basics for anyone interested in serious gambling, but to make sure we’re all on the same page I’ll throw some maths at you:

The Expected Value, or EV, of any bet is, simply put, the sum of all outcomes multiplied by their respective probabilities – indicating the punter’s average profit or loss on each bet. So with our coin flip, we’ll win a net of 1.03 units 50% of the time and lose 1 unit 50% of the time; our EV is therefore 1.03 * 0.5 + (-1 * 0.5) = 0.015, for a positive edge of 1.5% and an average profit of 0.015 units per bet. For these simple types of bets though, an easier way to calculate EV is to divide the given odds by the true odds and subtract 1: 2.03 / 2.0 – 1 = 0.015.

An edge of only 1.5% is nothing to scoff at though, empires has been built on less, so we’ll definitely want to bet something – but how much?

Stake sizing is much down to personal preferences about risk aversion and tolerance of the variance innately involved in gambling, but with some Python code we can at least have a look at some different strategies before we set out to chase riches and glory flipping coins. Just like in the last post I’ll just give you the code with some comments in it, which will hopefully guide you along what’s happening  before I briefly explain it.

Here we go:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

def coin_flips(n=10000,odds=1.97,bankroll=100,stake=1,bankrupt=False):
    '''
    Simulates 10000 coinflips for a single punter, betting at 1.97 odds,
    also calculates net winnings

    NEW: default bankroll and stake set at 100 and 1, respectively
    now also calculates if player went bankrupt or not
    '''

    # create a pandas dataframe for storing coin flip results
    # and calculate net winnings
    df = pd.DataFrame()
    # insert n number of coinflips, 0=loss, 1=win
    df['result'] = np.random.randint(2,size=n)
    # calculate net winnings
    df['net'] = np.where(df['result']==1,stake*odds-stake,-stake)
    # calculate cumulative net winnings
    df['cum_net'] = df['net'].cumsum()

    # calculate total bankroll
    df['bankroll'] = df['cum_net'] + bankroll

    # if bankroll goes below the default stake, punter will stop betting
    # count times bankroll < stake
    df['bankrupt'] = np.where(df['bankroll']<stake,1,0)
    # count cumulative bankruptcies, with column shifted one step down
    df['bankruptcies'] = df['bankrupt'].cumsum().shift(1)
    # in case first flip is a loss, bankruptcies will be NaN, replace with 0
    df.fillna(0,inplace=True)
    # drop all flips after first bankruptcy
    if bankrupt:
        df = df[df['bankruptcies']==0]

    return df

First off, we’ll modify our original coin_flips function to take our punter’s bankroll and stake size into consideration, setting the bankrupt threshold at the point where a default sized bet can no longer be made. By default, our punter will have an endless stream of 100 unit bankrolls, but if we set the parameter bankrupt to True, the function will cut away any coin flips after his first bankruptcy.

def many_coin_flips(punters=100,n=10000,odds=1.97,bankroll=100,stake=1,color='r',plot=False,bankrupt=False):
    '''
    Simulates 10000 coinflips for 100 different punters,
    all betting at 1.97 odds,
    also calculates and plots net winnings for each punter

    NEW: now also saves punter bankruptcies
    '''

    # create pandas dataframe for storing punter results
    punter_df = pd.DataFrame()
    # loop through all punters
    for i in np.arange(punters):
        # simulate coin flips
        df = coin_flips(n,odds,bankroll,stake,bankrupt)
        # calculate net
        net = df['net'].sum()
        # check for bankruptcy
        bankruptcy = df['bankrupt'].sum()

        # append to our punter dataframe
        punter_df = punter_df.append({'odds':odds,
                                      'net':net,
                                      'bankrupt':bankruptcy},ignore_index=True)

        if plot:
            # plot the cumulative winnings over time
            df['cum_net'].plot(color=color,alpha=0.1)

    # check if punters ended up in profit
    punter_df['winning'] = np.where(punter_df['net']>0,1,0)

    return punter_df

We also want to modify the many_coin_flips function so that it’ll also take bankroll and stake size into consideration, counting up how many of our punters went bankrupt.

We won’t use the compare_odds function here, instead we’ll write a new one to compare stake sizing – but if we ever want to use it again sometime in the future a few minor changes will be needed here as well:

def compare_odds(punters=100,n=10000,odds=[1.97,2.00,2.03]):
    '''
    Simulates and compare coin flip net winnings
    after 10000 flips for 3 groups of punters,
    betting at odds of 1.97, 2.00 and 2.03, respectively.
    Also plots every punters net winnings
    '''

    # create figure and ax objects to plot on
    fig, ax = plt.subplots()

    # set y coordinates for annotating text for each group of punters
    ys = [0.25,0.5,0.75]
    # assign colors to each group of punters
    cs = ['r','y','g']

    # loop through the groups of punters, with their respective odds,
    # chosen color and y for annotating text
    for odd, color, y in zip(odds,cs,ys):
        # run coin flip simulation with given odds, plot with chosen color
        df = many_coin_flips(punters,n,odd,color=color,plot=True)
        # calculate how many punters in the group ended up in profit
        winning_punters = df['winning'].mean()
        # set a text to annotate
        win_text = '%.2f: %.0f%%' %(odd,winning_punters * 100)
        # annotate odds and chance of profit for each group of punters
        ax.annotate(win_text,xy=(1.02,y),
                    xycoords='axes fraction', color=color,va='center')

    # set title
    ax.set_title('Chances of ending up in profit after %s coin flips' %n)
    # set x and y axis labels
    ax.set_xlabel('Number of flips')
    ax.set_ylabel('Net profit')
    # add annotation 'legend'
    ax.annotate('odds: chance',xy=(1.02,1.0),
                xycoords=('axes fraction'),fontsize=10,va='center')
    # add horizontal line at breakeven point
    plt.axhline(color='k',alpha=0.5)
    # set y axis range at some nice number
    ax.set_ylim(-450,450)

    # show plot
    plt.show()

Now, with all our previous coin flip functions taking bankroll and stake size into consideration, we can go ahead and evaluate a few stake sizing strategies with a new function:

def compare_stakes(punters=200,n=10000,odds=2.03,stakes=[100,50,25,10,5,2,1,0.5],bankroll=100):
    '''
    Similar to compare_odds, but here we instead want to compare different
    staking sizes for our coin flips betting at 2.03 odds

    Increased number of punters in each group, from 100 to 200

    Also prints out the results
    '''

    # pandas df to store results
    results_df = pd.DataFrame(columns=['stake','win','lose','bankrupt'])

    # colors to use in plot later, green=1=win, yellow=4=lost, red=2=bankrupt
    colors = [sns.color_palette()[i] for i in (1,4,2)]

    # loop through the groups of punters, with their respective odds
    for stake in stakes:
        # run coin flip simulation with given stake
        df = many_coin_flips(punters,n,odds,stake=stake,bankrupt=True)
        # calculate how many punters in the group ended up in profit
        winning_punters = df['winning'].mean()
        # ...and how many went bankrupt
        bankrupt_punters = df['bankrupt'].mean()
        # lost money but not bankrupt
        lose = 1 - winning_punters - bankrupt_punters

        # append to dataframe
        results_df = results_df.append({'stake':stake,
                                        'win':winning_punters,
                                        'lose':lose,
                                        'bankrupt':bankrupt_punters},ignore_index=True)

    # set stake as index
    results_df.set_index('stake',inplace=True)

    # plot
    fig = plt.figure()
    # create ax object
    ax = results_df.plot(kind='bar',stacked=True,color=colors,alpha=0.8)
    # fix title, axis labels etc
    ax.set_title('Simulation results: betting %s coin flips at %s odds, starting bankroll %s' %(n,odds,bankroll))
    ax.set_ylabel('%')
    # set legend outside plot
    ax.legend(bbox_to_anchor=(1.2,0.5))

    # add percentage annotation for both win and bankrupt
    for x, w, l, b in zip(np.arange(len(results_df)),results_df['win'],results_df['lose'],results_df['bankrupt']):
        # calculate y coordinates
        win_y = w/2
        lost_y = w + l/2
        bankr_y = w + l + b/2

        # annotate win, lose and bankrupt %, only if >=2%
        if w >= 0.04:
            ax.annotate('%.0f%%' %(w * 100),xy=(x,win_y),va='center',ha='center')
        if l >= 0.04:
            ax.annotate('%.0f%%' %(l * 100),xy=(x,lost_y),va='center',ha='center')
        if b >= 0.04:
            ax.annotate('%.0f%%' %(b * 100),xy=(x,bankr_y),va='center',ha='center')

    plt.show()

By default, our new compare_stakes function creates a number of punter groups, all betting on fair coin flips at 2.03 odds with a starting bankroll of a 100 units. For each group and their different staking plan, the function takes note of how many ended up in profit, how many lost and how many went bankrupt.

As we can see on the plot below, the results differ substantially:

01

Just like last time, I want to remind you that any numbers here are only rough estimates, and increasing the size of each punter group as well as the number of coin flips will get us closer to the true values.

So what can we learn from the above plot? Well, the main lesson is that even if you have a theoretically profitable bet, your edge will account for nearly nothing if you are too bold with your staking. Putting your whole bankroll at risk will see you go bankrupt around 96% of the time, and even if you bet as small as 2 units, you’ll still face a considerable risk of screwing up a lucrative proposition. The truth is that with such a small edge, keeping your bet small as well is the way to go if you want to make it in the long run.

But what if some fool offered us even higher odds, let’s say 2.20? First off, we would have to check if the person was A: mentally stable, and B: rich enough to pay us if (or rather, when) we win, before we go ahead and bet. Here our edge would be 10% (2.2 / 2.0 – 1), nearly 10 times as large as in the 2.03 situation, so we’ll likely be able to bet more – but how much? Well, the functions are written with this in mind, enabling us to play around with different situations and strategies. Specifying the odds parameter of our new function as 2.20, here’s what betting at a fair coin flip at 2.20 odds would look like:

02

As can be seen from the new plot, with a larger edge we can go ahead and raise our stake size considerably, hopefully boosting our winnings as well. So the main take-away from this small exercise is that even if you have an edge, if you want to make it in the long run you’ll have to be careful with your staking to avoid blowing up your bankroll – but also that the larger your edge, the larger you can afford to bet.

That’s it for now, but I’ll hopefully be back soon with a Part 2 about stake sizing, looking at a staking plan that actually takes your (perceived) edge into account when calculating the optimal stake size: The Kelly Criterion.

Advertisement
Stake sizing, Part 1

Flipping coins, and the importance of betting at the highest odds

As I stated in the previous post, this blog will now focus more on gambling, using Python code to investigate whatever comes to my mind around the subject.

Today I’ll have a look at a classic gambling example – the flip of a coin – but before I go ahead and talk you through the code I want to state a few things that I know some of you will be wondering. Though R seems to be the language preferred by most in the football analytics scene, I have chosen Python simply because I feel it is so much more intuitive and easier to learn. RStudio seems to be the tool of choice for the R folks, but I don’t know of any real dominant counterpart for Python. I use Spyder, available through downloading Anaconda, mainly because it’s easy to use and comes with a lot of useful stuff pre-installed. If you’re thinking about testing it out yourself, I would suggest switching the color scheme of the editor to Zenburn for that dark and cool programming look that really make your code look super important, and run your scripts in the included IPython console.

One final, very important thing: I am not in any way an expert programmer, statistician, mathematician or anything like that. I am simply a gambler looking to use these fields to get an edge. It’s totally OK to simply copy and paste any code I publish here to use yourself and play around with it however you may wish. If you notice any mistakes or if something doesn’t add up, please comment. I’m happy to learn new stuff.

Flipping coins, and the importance of betting at the highest odds

The inspiration for this post came the other day when I noticed that a few hours prior to kick-off in this year’s Super Bowl, the bookmaker Pinnacle offered 1.97 odds on the opening coin flip. A sucker bet, I thought to myself, knowing the true odds of a fair coin to be 2.00. The coin flip is a very popular Super Bowl prop bet though and as it was pointed out to me on Twitter, a few books actually offered the fair odds of 2.00. Choosing the highest odds available is crucial if you want to make money gambling in the long run, so I decided to write up a nice little Python script to visualise my point.

The layout of these blog posts will be that I simply throw a piece of code at you, before explaining it. The comments in the code itself should also help you out, and for those of you who already know Python much will be simple basics, while those who’s completely new to coding or Python will hopefully learn a few things.

Here we go:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

def coin_flips(n=10000,odds=1.97):
    '''
    Simulates 10000 coinflips for a single punter,
    betting at 1.97 odds,
    also calculates net winnings
    '''

    # create a pandas dataframe for storing coin flip results
    # and calculate net winnings
    df = pd.DataFrame()
    # insert n number of coinflips, 0=loss, 1=win
    df['result'] = np.random.randint(2,size=n)
    # calculate net winnings
    df['net'] = np.where(df['result']==1,odds-1,-1)
    # calculate cumulative net winnings
    df['cum_net'] = df['net'].cumsum()

    return df

Allright, so after importing all the needed modules for this piece, we go ahead and define our first function, coin_flips, which will be used to simulate the coin flips and calculate the net winnings of a single punter. I’ve chosen 10,000 flips and Pinnacle’s odds of 1.97 as our default values here.

Creating a pandas dataframe, we can easily store the result of each coin flip. Now, as we assume that the coin is fair, there’s no need to even consider which side our punter would call each time, instead we can simply go ahead and use numpy to simulate a series of ones and zeros, representing either a win or a loss. Calculating the net result of each flip is also very straightforward as when he wins, our punter will pocket the net end of the offered odds, 0.97, while losing will see his pocket lightened by 1 unit. Calculating the cumulative net winnings is also very easy using pandas’ built-in cumsum function.

For coding reasons, the function is set to return the dataframe so calling it will simply make a lot of numbers pop up, but running the coin_flips()[‘cum_net’].plot() command in the IPython console will let you simulate a punter’s coin flips, and also plot his cumulative net winnings like this:

01

Every time you run the command another simulation will run with a new, different result. Doing this a couple of times, you’ll likely understand why I described this as a sucker bet. Sure, you can get lucky and win, even a couple of times in a row – but betting with the odds against you, you’ll find it very hard to make a profit long term.

But that single punter flipping coins 10,000 times actually doesn’t say that much, maybe he just got unlucky? To dig deeper we want to know just how likely you are to end up with a profit after 10,000 coin flips. So we write another function, using the previous one to simulate the results of many more punters betting on 10,000 coin flips. How many do you think will end up in profit?

def many_coin_flips(punters=100,n=10000,odds=1.97,color='r'):
    '''
    Simulates 10000 coinflips for 100 different punters,
    all betting at 1.97 odds,
    also calculates and plots net winnings for each punter
    '''

    # create pandas dataframe for storing punter results
    punter_df = pd.DataFrame()
    # loop through all punters
    for i in np.arange(punters):
        # simulate coin flips
        df = coin_flips(n,odds)
        # calculate net
        net = df['net'].sum()
        # append to our punter dataframe
        punter_df = punter_df.append({'odds':odds,
                                      'net':net},ignore_index=True)

        # plot the cumulative winnings over time
        df['cum_net'].plot(color=color,alpha=0.1)

    # check if punters ended up in profit
    punter_df['winning'] = np.where(punter_df['net']>0,1,0)

    return punter_df

The slightly more complicated many_coin_flips function uses the earlier coin_flips to loop through a group of punters, 100 by default, and save their results into a new pandas dataframe, punter_df, where we’ll assign a 1 to all punters who ended up in profit while all the losers get a 0. We also plot each punters cumulative net winnings with a nice red color to symbolise their (very) likely bankruptcy.

This function also returns a dataframe so running it will again make a lot of numbers pop up in the console, but it also plots out the financial fate of each punter, like this:

02.png

As we can see, there actually are a few of our 100 punters who got lucky enough to end up winning after 10,000 coin flips. But most of them ended up way below the break-even point, losing a lof of money. If this was a real group of punters we can only hope that even if they were stupid enough to set out betting on 10,000 coin flips at these odds, they’ll at least at some point realise their mistake and quit.

But how about if we change the offered odds? As I mentioned earlier, some books actually put up the fair odds of 2.00. How would 100 punters do after 10,000 coin flips betting at those odds? Well, we’ll have to write a new function for that. Also, just for fun (or to make a point) I’ve included an additional group of 100 punters lucky enough to be allowed to bet on the coin flips at odds of 2.03 – literally a license to print money.

def compare_odds(punters=100,n=10000,odds=[1.97,2.00,2.03]):
    '''
    Simulates and compare coin flip net winnings
    after 10000 flips for 3 groups of punters,
    betting at odds of 1.97, 2.00 and 2.03, respectively.
    Also plots every punters net winnings
    '''

    # create figure and ax objects to plot on
    fig, ax = plt.subplots()

    # set y coordinates for annotating text for each group of punters
    ys = [0.25,0.5,0.75]
    # assign colors to each group of punters
    cs = ['r','y','g']

    # loop through the groups of punters, with their respective odds,
    # chosen color and y for annotating text
    for odd, color, y in zip(odds,cs,ys):
        # run coin flip simulation with given odds, plot with chosen color
        df = many_coin_flips(punters,n,odd,color)
        # calculate how many punters in the group ended up in profit
        winning_punters = df['winning'].mean()
        # set a text to annotate
        win_text = '%.2f: %.0f%%' %(odd,winning_punters * 100)
        # annotate odds and chance of profit for each group of punters
        ax.annotate(win_text,xy=(1.02,y),
                    xycoords='axes fraction', color=color,va='center')

    # set title
    ax.set_title('Chances of ending up in profit after %s coin flips' %n)
    # set x and y axis labels
    ax.set_xlabel('Number of flips')
    ax.set_ylabel('Net profit')
    # add annotation 'legend'
    ax.annotate('odds: chance',xy=(1.02,1.0),
                xycoords=('axes fraction'),fontsize=10,va='center')
    # add horizontal line at breakeven point
    plt.axhline(color='k',alpha=0.5)
    # set y axis range at some nice number
    ax.set_ylim(-450,450)

    # show plot
    plt.show()

This last function makes use of the two previous ones to simulate the coin flips of our three groups of punters, plotting their total net winnings all on the same ax object, which we later make use of to add a title and some nice labels to the axes. We also add a horizontal line to be able to better compare the punters’ winnings with the break-even point, as well as some text annotation to explain the colors of the three groups.

Now, running the compare_odds() function in the IPython console will hopefully result in something like this:

03

Here we clearly see just how important betting at the highest odds really is. Have in mind though that the numbers to the right are only rough estimates. As you can see, the yellow group of punters who bet at the fair odds of 2.00 did not win exactly 50% of the time, but close enough. I actually had to re-run the function a few times to get this close. But it’s only natural since we only had 100 punters, a very small number in this context, in each of our groups. The more punters and coin flips we use in our simulations, the closer we’ll come to the real win percentages – but here speed is more important than super accuracy.

So as we clearly see in the above plot, betting on the coin flip at Pinnacle’s 1.97 odds really is a sucker bet, albeit an entertaining one if you were planning to watch the Super Bowl. But if you hope to make a profit from your betting, finding the highest available odds to bet on is crucial, as is shown by the green group of punters who were allowed to bet at odds of 2.03. It’s only a difference of 0.06, but it makes all the difference in the long run. The margins in betting are tiny, but they add up over time.

The lessons learned here can easily be transferred to sports betting in general and football betting in particular, were the Asian Handicaps and Over/Under markets focus on odds around even money. The coin flip example is special though as we knew the true odds of the bet beforehand, something you’ll never be able to know betting on football. But as shown in the last plot, by consistently betting at the highest available odds, you at least give yourself a much better chance of ending up in profit.

Flipping coins, and the importance of betting at the highest odds

Allsvenskan 2016 – The Endgame

Before I continue with another Allsvenskan 2016 update – the last before the season ends – I have some news regarding the blog.

As some of you may know, I’ve been working part time for StrataBet this season, mostly writing game previews for the Norwegian Tippeligaen. As I soon take on a new, full-time job elsewhere I likely won’t have the time to write as much as I want. Also, with my new job focusing on Allsvenskan and Swedish football in general, I may be reluctant to give away too much information to the general public, so the future of this blog is very uncertain.

I’m hoping to continue writing in some form though, and what I do write will likely be closely linked to StrataBet as they’ve given me access to their great dataset.

Allsvenskan 2016 – The Endgame

Ok, so let’s get on with another update. With only 3 rounds left – the next starts tonight – we can see how much of the drama has gone out of the league table since last time. Malmö have retaken the top spot and thanks to Norrköping’s recent poor form the gap down to the title contenders is now 4 points. Sure, both Norrköping and AIK can still theoretically win the title, but I would be very surprised if Malmö let this slip out of their hands, despite the disappointing defeat to Östersund. They do have some disturbing injury problems though…

01

Göteborg have a chance to break into the top-3 and gain a European spot for next season, but this looks even more unlikely with 7 points up to AIK. At the other end of the table the bottom-3 have looked locked in for a long time. Helsingborg still have a chance to overtake Sundsvall, but again I’d be very surprised if this happens. In mid-table we see how Elfsborg, Kalmar and Hammarby have climbed a few spots at the expense of Örebro, Häcken and Östersund.

02

Counting up shots we see how Djurgården surprisingly is the best defensive side when it comes to denying the opposition chances to shoot. We also see how Gefle continue to be very bad and that Örebro still is the main outlier with A LOT of shots both taken and conceded.

03

Looking at effectiveness up front we see few changes since last time. Elfsborg have been slightly more effective with their shooting though, partly explaining their climb in the table. On the other end of the scale, Helsingborg have had a real problem scoring on their chances lately, with ZERO goals since the last update.

04

Looking at defensive effectiveness we see why Djurgården’s ability to deny the opposition chances hasn’t seen them climb into the upper half of the table: They still concede a lot of goals on the chances they do allow. Only bottom-of-the-table Falkenberg are worse. With Malmö and Norrköping’s effectiveness declining since last time, AIK now stands out as the far superior defensive side.

05

Not much have changed in terms of chance quality either – but what is interesting here is that Djurgården is the best defensive side when it come to xG as well. So if they concede very few chances, and very little xG – why are they conceding all those goals? My guess is – I don’t have time to look it up – that the few chances they do concede are of higher quality. Djurgården have also had a lot of problems with goalkeepers this season. Having used 4 keepers so far, only star signing Andreas Isaksson has looked stable enough but he has picked up an injury and will be out for the remainder of the season.

I don’t know much about evaluating goalkeepers but have been thinking about doing a blog post about it for some time now, hopefully I’ll get to it in the near future.

06

Looking at Expected Goals Difference, we see how Djurgården’s lack of defensive effectiveness has robbed them of a nice upper half finish. My model currently ranks them as 5th in the league, close to Hammarby in 4th – far above their current 11th place.

We also see how AIK have overtaken Norrköping in 2nd place, and with the reigning champions in poor form and just 3 points above AIK, this is where most of the drama left in the season lies. At the bottom of the table, Helsingborg are actually ranked far better than Sundsvall above them, but the 7 point gap will likely be too much for Henrik Larsson’s men with only 3 games remaining.

07

The model has always liked Malmö and they actually have the chance to secure the title tonight, if Norrköping lose away to Elfsborg while Malmö win away to Falkenberg – a not too unlikely outcome. In the race for 2nd place, AIK now have the upper hand much thanks to Norrköping’s recent poor results. Göteborg seems to have all but locked in the 4th place and the same goes for the bottom 3.

To continue my slight focus on Djurgården in this post, they’re interestingly projected to take about 6 points from their 3 remaining games: Helsingborg away, Häcken at home and Sundsvall away. Given their very disappointing season, and as a cynical Djurgården supporter, I doubt this.

Allsvenskan 2016 – The Endgame

Allsvenskan round 23 update

It’s been over six weeks since my last Allsvenskan update but now I finally have time to get to it. Six rounds have been played since last time and a lot has happened. Let’s take a look at the league table:

00

Compared to last time, we can immediately see that reigning champions Norrköping have climbed up above Malmö to claim the top spot, which is very impressing given the players who have left the club, and the mid-season managerial change.

At the other end of the table, Djurgården have (luckily for me) picked up pace under new manager Dempsey and moved up from 14th to 11th, while Helsingborg and Sundsvall have struggled – only picking up 2 points each.

Let’s have a closer look on how the teams have performed:01

Despite giving up the first place in the table to Norrköping, Malmö have distanced themselves from the rest in terms of shot dominance. Not much else has changed, Örebro are still involved in some very open games while Gefle struggle to create chances.

02

Örebro and Elfsborg have moved into the ‘constant threat’ quadrant thanks to some effective scoring, while Hammarby have done the opposite. Kalmar have improved their effectiveness, but at the same time seen a drop in shots taken per game.

03

Here we see how AIK’s and Norrköping’s improvements come mainly from their defensive work; both sides have been better at keeping shots from going in since the last update. Kalmar’s defensive effectiveness has improved as well.

04

05

Expected goals for and against look much like they did last time but AIK’s defensive improvements have seen them close in on the top 2 sides, as they’ve increased their xGD by nearly 0.20 per game.

How about a prediction then?06

Malmö’s defeat to Djurgården has really opened up the title race, but my model still fancy them. Norrköping have improved though, and we could be in for a very interesting finish to the season. AIK have improved as well, and have seemingly all but locked in a top-3 spot. In the other end of the table Falkenberg have plummeted from around 22 expected points to less than 16, with the model giving them no chance of reaching the relegation play-off spot occupied by Helsingborg.

Djurgården under Mark Dempsey

As mentioned earlier, as a Djurgården supporter I’m very happy with how the form has improved under new manager Dempsey. In the last update I showed the long-term trends leading up to Olsson’s sacking, and now that Dempsey’s been in charge for 7 games we can see how he’s managed to turn things around:

07

While shots conceded actually declined during Olsson’s last season, so did shots taken. What we see under Dempsey’s rule is clear: everything have improved! Djurgården now concede less and take more shots but more importantly, both actual goal difference and xG difference has improved, leading to more points and a climb in the league table.

Though a bit of hindsight, through my work with Norwegian football I was optimistic about Dempsey coming in as I knew he would provide the energy needed for a turnaround. Let’s hope Djurgården can continue to pick up points to climb further.

Passing spiders

Another thing I mentioned in the last update was how Opta data is now available for Allsvenskan, and I showed some passing maps heavily inspired by 11tegen11 and David Sumpter. I’ve since then played around with the script to create passing map animations, which received a lot of positive feedback on twitter and have now been dubbed ‘passing spiders’, often a quite fitting name.

I don’t know enough about tactics to determine if these animations holds some analytical value, but they are fun to look at and could possibly be used to provide an interesting narrative of individual games combined with other types of analysis. I got a lot of good advice on improvements on the animation and will implement some of it in the future.

That’s it for now!

Allsvenskan round 23 update

Expected Goals for Damallsvenskan, Sweden’s top women’s league

Watching the Swedish Women’s National Team beat Brazil to advance to the final in the Olympics, I suddenly realised that the shot location data I collect for Allsvenskan is available for the top women’s league Damallsvenskan as well. I then thought about Chad Murphy’s work collecting data MANUALLY for USA’s NWSL and how I should build an Expected Goals model for Damallsvenskan.

Hoping that the data was of the same quality as that for Allsvenskan, and with the code more or less already in place from my other projects, I plotted out the xG difference for each team this season:

05

Wow, only 3 teams with positive xGD! Had I found some bug in the code or was the data faulty? Having next to no knowledge about the league I looked it up, and yeah, Damallsvenskan is very skewed towards a few top teams:

00

Rosengård and Linköping really dominate the league and have all but locked up the Champions League spots with their 30+ points already, while the four bottom teams all have less than 10 points. As we’ve seen above, Linköping is the really superior side when it comes to Expected Goals.

Let’s dig deeper:

01

Here we see part of why Linköping have such a commanding lead in the xGD table. They take far more shots per game than anybody, and over 4 shots more than closest contestants Rosengård. The two top teams look superior when it comes to shots, but what about effectiveness?

02

The top teams stands out here too, but this time it’s Rosengård with the superior numbers as they score on nearly every 4th shot! Kristianstad on the other hand are very ineffective with about 13 shots per goal.

03

The two top sides don’t stand out defensively, instead it’s Kopparbergs/Göteborg who dominate the defensive effectiveness with over 13 shots faced per goal conceded. At the bottom, Mallbacken are struggling defensively with under 5 shots faced per goal conceded!

04

Looking at Expected Goals for and against we really see how especially Linköping dominate the league by keeping a solid defence and producing a crazy ~3 xG per game!

Player stats

Ok, so that’s it about teams – what about individual players? I ran my player script and was happy to see two players I’ve heard of despite not following women’s football: Stina Blackstelius I know from the Swedish national team and Marta I know because she’s Marta.

06

There’s a lot of players with a Goal Contribution around 1 per game, and as expected they’re mostly from the top teams. As a Djurgården supporter, I’m happy to see young Johanna Kaneryd placing second of the younger players. So what about Expected Goals?

07

Not surprisingly, xG queens Linköping occupy the top three spots when it comes to xG per 90 mins. What is surprising though is that top contender Rosengård only have one player in the top 10, with Marta just missing out by 0.03 xG90.

Prediction

What about a prediction then? As we’ve seen, the model clearly ranks Linköping highest and as they’ve got one game in hand and Rosengård left to face at home, this really shows up in the prediction with the model giving them 72% to take the title.

08

Expected Goals for Damallsvenskan, Sweden’s top women’s league

Allsvenskan round 17 update & Opta data!

Ok, it’s time for another Allsvenskan update. With 5 rounds having been played since last time we should be able to see some changes.

00

Looking at the table we see how Djurgården, Örebro and Sundsvall all have dropped a few places while Östersund and Häcken are the big winners. As a result of Djurgården’s poor performance, Pelle Olsson has been sacked and replaced by Mark Dempsey.

01

As I noted last time, the league seems to have settled when it comes to shots. Indeed, no team has changed quadrant since the last update.

02

Looking at attacking effectiveness we see how Gefle have become more clinical in front of goal while Djurgården have become more ineffective, partly explaining their struggles.

03

Only Elfsborg have changed defensive quadrant since last time, dropping from ‘competent but busy’ to ‘pushovers’. Hammarby and Sundsvall have also dropped a bit while Malmö have improved their defence.

04

05

Malmö and Norrköping are still at the top of the xG-table, but the big surprise is Östersund’s rise to third place – mostly due to an improvement in their attacking output. Despite their recent struggles, Djurgården still sits in sixth place. Helsingborg have climbed up to 13th, leaving Gefle at the bottom.

06

07

08

Looking at Expected Points, we can see just how bad Djurgården have performed recently. They are ranked sixth by xPoints but sit at the bottom of the xPoints Performance table, about 9 points below expectation. In my simulations, they reached at least their current total of 18 points about 98% of the time – indicating a massive underperformance.

09

My game simulation model still consider Malmö heavy favourites for the title. I certainly agree but 93% is much too high considering they’re only 1 point ahead of Norrköping at the moment.

Djurgården managerial change, and Opta data

As a Djurgården supporter I’ve welcomed Pelle Olsson’s sacking, as DIF have been very poor under him this season. Shot dominance has decreased since last season, but more importantly, the actual goal difference and Expected Goals Difference have plummeted. The club’s situation look alarmingly similar to when Per-Mathias Høgmo came in to save us from relegation in 2013. Hopefully Høgmo’s former assistant coach can repeat that feat this autumn.

pelle_01

More news is that Opta data is now available for Allsvenskan. I probably won’t have time to dig too deep into it at the moment, but I’ve written a script for plotting passing networks, heavily influenced by 11tegen11 and David Sumpter.

Using these plots, we can compare Pelle Olsson’s last game using a 4-4-2 formation (Opta has it down as a weird 4-2-2-2 though) against Dempsey’s first game in charge where he used the same formation.1517

Sure, it’s only one game – but we can see some distinct differences here as Dempsey used a midfield diamond with Kevin Walker pushing up while Alexander Faltsetas dropped down deeper. Olsson has always favoured two holding central midfielders. Also, Dempsey has gone for a more straight forward approach to attacking, with more direct passes up towards the strikers, while Olsson used more crossing.

Allsvenskan round 17 update & Opta data!

Another Allsvenskan 2016 update

As the league has now gone on a summer break for the UEFA Euro 2016, let’s take another look a the Allsvenskan season so far.

2016_update_01

Since last time, Malmö have overtaken Norrköping at the top, and the early surprise side Sundsvall have dropped to 6th. AIK, Kalmar and Häcken have climbed in the table, while Djurgården, Hammarby and Helsingborg have done the opposite. Gefle and Falkenberg still struggle at the bottom.

2016_update_02

Shots-wise the league seems to have settled, as only Elfsborg and Göteborg have changed quadrants since the last update. Also, Malmö’s gap to the other clubs has decreased.

2016_update_03

Looking at effectiveness in attack, we can see partially why some sides have climbed or dropped in the league table. Malmö and Häcken have enjoyed some efficient scoring, moving them from ‘wasteful’ into the ‘constant threat’ quadrant, while Djurgården have done the opposite.

2016_update_04

Defensively, we see how AIK have been more effective at the back together with Sundsvall and Jönköpings Södra, while Djurgården’s performance has worsened.

2016_update_05

Looking at xG, we see how AIK have overtaken Malmö as the best attacking side, but have at the same time moved into the  ‘worse defence’ half. Hammarby’s attacking numbers have dropped while Falkenberg have performed better. Östersund and Örebro still sit at opposite ends, with the former involved in some low xG games and the latter producing some xG-fests with both defensive and attacking xG at about 1.8 per game.

2016_update_06

Malmö are still at the top of the xGD table, but have dropped a bit from their >1.0 from last time. Kalmar have climbed to third while Djurgården and AIK have dropped. The bottom three remain the same as last time.

2016_update_07

2016_update_08

Looking at Expected Points for a ‘fair’ table based on the shots taken and conceded so far, we see how Malmö are still at the top while Gefle are stuck at the bottom. Göteborg have overtaken AIK in the top three, while Kalmar have climbed by about 8 points out of 12 possible.

2016_update_09

A note on Expected Points Performance: Winning teams will always outperform their Expected Points, as picking up all 3 points will usually be above expectation as no team dominate a game so much as to warrant a 100% win probability. The same goes for teams who consistently lose, as 0 points will usually be below expectation.

2016_update_100

Looking at time spent in Game States, we see how Gefle have spent just about 10% of the season in the lead so far. Helsingborg and Häcken have spent little time drawing while Sundsvall still have spent very little time trailing.

2016_update_11

Just with like the actual league table, there some big differences in the prediction compared to the last update, showing how difficult it can be to predict the league this early into the season. Mid-table has really opened up since last time, but the top 2 and bottom 3 remains the same.

Long-term trends and managerial changes

Usually, I would’ve ended the post here but as two managers have been sacked since the last update, I thought it would be interesting to see how AIK and Gefle have performed under Andreas Alm and Roger Sandberg respectively. I won’t comment on these plots more than that Alm likely had to leave because of politics and disputes at the club, while Sandberg was sacked due to Gefle’s poor results.

long_term_AIK

long_term_GEF

Another Allsvenskan 2016 update

Allsvenskan 2016 update

With three rounds of Allsvenskan games played since my last post, it’s time for an update. Like last time, I’m just going to throw a few visualizations at you together with my initial thoughts without going too much in-depth.

Starting out with the league table, we see just how close the league has been so far – with eight games played, only four points separate Östersund in 11th place from Malmö in 2nd.

allsv_update_01_01

We can also see some interesting streaks since last time, with Norrköping and Elfsborg winning all three games while Hammarby, Gefle and Falkenberg have been struggling. Looking at the early surprise teams we see that Sundsvall have continued to perform well while Jönköpings Södra have dropped in the table.

allsv_update_01_02Looking at shots we see how Hammarby, Kalmar and Norrköping have all moved in to the ‘busy attack, quit defence’ quadrant, indicating that they’ve played a bit better lately (or faced easier opposition!), while Sundsvall is still stuck in the ‘quiet attack, busy defence’ quadrant.

allsv_update_01_03While Malmö produces a lot of shots, they’re still one of the most ineffective sides up front. Göteborg and Norrköping on the other hand are enyoing some effective scoring at the moment.

allsv_update_01_04Sundsvall are still conceding a lot of shots, but at least they’re not converted into goals very often – which in part explains their good results so far. Elfsborg have moved into the ‘formidable’ defensive quadrant, only conceding one goal in the last three games.

allsv_update_01_05Looking at Expected Goals, Malmö are still the clearly best team, with Norrköping improving while Djurgården have dropped a bit. Here we really see the difference between the early surprise teams’ performance recently, as Sundsvall have improved both attacking and defensive numbers while Jönköpings Södra have done the opposite.

So how would the teams rank xG-wise? Expected Goals Difference should do well as measure of skill, and here we again see how the model ranks Malmö as the best side so far, with Norrköping and AIK the main contenders. A bottom three of Helsingborg, Gefle and Falkenberg have also emerged.

allsv_update_01_06

Another way of evaluating the teams’ performance so far is to simulate how many points on average each team would’ve received from their games. To do this I’ve used the shots from each game to simulate the result 10,000 times and the teams have then been awarded Expected Points based on the derived 1X2 probabilities.

For example, if the simulation would come up with probabilities of 0.5, 0.3 and 0.2 for each outcome then the home side would be awarded 0.5*3 + 0.3*1 or 1.8 Expected Points, while the away side would get 0.2*3 + 0.3*1, or 0.9 Expected Points.

Here’s a table of the team’s Expected Points so far:

allsv_update_01_07But a team can’t get 1.8 points from a game, only 0, 1 or 3 – so how have the teams performed compared to their Expected Points?

allsv_update_01_08Note: Malmö have been awarded a 3-0 win against Göteborg as the game was abandoned due to home fans throwing pyrotechnics towards a Malmö player. These points have been included.

Here we see how Helsingborg and Sundsvall have taken quite a lot more points than expected, while Falkenberg and Kalmar have done the opposite. This could be the result of some good/bad luck, but it can also mean that the model fail to properly assess the quality of these teams.

Let’s dig deeper and have a look at the Expected Points distribution of each team:

allsv_update_01_10Looking at these distributions we can see just how extreme the results have been for some of the teams so far. In fact, my model estimates that if we re-played Helsingborg’s games 10,000 times, they would get 13 points or more only about 5% of the time!

Lastly, here’s my updated prediction of the final Allsvenskan 2016 table:

allsv_update_01_09

That’s it for now. I hope to be back with another update when the league have gone on break for Euro 2016, and maybe I’ll look closer at individual players then.

 

Allsvenskan 2016 update

Allsvenskan 2016 so far

With 5 rounds of games played I thought it would be a good time to look at how the 2016 Allsvenskan is going. Let’s have a look at the league table so far:

2016_00

True to it’s rather unexpected nature, the opening five rounds of the 2016 Allsvenskan have seen some surprises, and I don’t think anyone expected Sundsvall and newly-promoted Jönköpings Södra to be at the top! Also, last year’s top team’s have been struggling a bit, but seem to have picked up the pace lately.

But nevermind the table – altough it never lies, it does give an unfair view of the teams’ underlying performances, especially with so few games played. To really have a look at how the team’s been coming along so far I’ve reproduced some of Ben Mayhew‘s beautiful scatterplots:

2016_01

Looking at shots taken and conceded per game we can see how Malmö, Djurgården and AIK have dominated their games so far, outshooting their opponent’s by some marginal. League leaders Sundsvall have, given their results, surprisingly spent most of their time in defence – but that’s just how Allsvenskan is.

 

2016_02

When looking closer at shooting effectiveness we see that surprise teams Sundsvall and Jönköpings Södra have been clinical in front of goal so far, partly explaining their results. Häcken on the other hand have really struggled to score.

 

2016_03

Looking at defensive effectiveness we can really see why Sundsvall are at the top of the Allsvenskan table. While spending a lot of time in defence, they’ve managed to concede very few goals given their shots faced. If this is down to some new tactic, skill or simply dumb luck remains to be seen – but for a team like Sundsvall I’m willing to say it’s the latter.

 

2016_04

Expected Goals-wise we see just how lucky Sundsvall have been so far. They’ve conceded a lot of xG while failing to produce up front, putting them in the same group as struggling sides Falkenberg, Häcken, Helsingborg and Gefle. Malmö is at the other side of the scale, producing a lot of high-quality chances while keeping a tight defence.

Another interesting thing to look at is time spent in Game States. As a result of their good performance (or luck!) so far, Sundsvall have only spent about 1% of minutes played losing so far while Gefle have only spent 10% in the lead!

2016_05

What about a prediction for the rest of the season then? I’ve used the games so far to fire up my league table simulation based on my Monte Carlo xG game simulation, and this is the result:

predict_2016_05

Note: The first table posted here was wrong due to a minor error in the code. This is the correct table.

As a Djurgården supporter, I kinda like the result – even though I think it’s a bit unrealistic for us to compete for silverware just yet. And anyways, a simulation of the whole season based on only 5 games tells more about what has happened so far than what we’ll see in the future, at least in my opinion.

The model clearly ranks Malmö as the best team in the league, as it’s done pretty much every season in my database, alongside AIK and Göteborg. Both the newly promoted teams, Östersund and Jönköpings Södra, seem competent xG-wise and have a good chance of staying up, while reigning champions Norrköping seem to be performing worse than last year. Gefle are always in the bottom of these kind of tables, but nevertheless seem to outsmart every metric available to avoid relegation season after season – but maybe this is the year they finally drop down to Superettan?

I’m planning to do these kind of updates at regular intervals, and maybe add some more plots and deeper analysis, but this will have to do for now!

Allsvenskan 2016 so far

A rough prediction of the new Allsvenskan season

Though I hadn’t planned on posting a prediction for the new Allsvenskan season until a couple of rounds had been played, after seeing Per Linde of fotbollssiffror posting his prediction on twitter and mentioning how he disagreed with it, I decided to do the same and fire up my league table prediction script from least season.

My Monte Carlo game prediction is designed to use at least a couple of rounds of data, so I was unsure how it would go about predicting a new season right from scratch, but I actually think it turned out better than expected:

predict_2016_01

There are some obvious problems though. First off, the script still thinks it’s 2015 and Jönköpings Södra and Östersund are playing in Superettan, causing some strange error where their every game is simulated as a 0-0 draw. This obviously skews the prediction for every team, but it isn’t really an error as the league simulation script isn’t designed to involve different leagues, and it’ll will be corrected when I update the database with the weekend’s results.

Also, every game is simulated with the teams’ squads as they were at the end of the 2015 season which is obviously a problem, with a lot of players coming and going since then – but again this will be fine once I update the database.

What about the actual prediction then? Besides the error with the promoted teams the only problem I have subjectively is the high percentages for Gefle’s relegation (they’ve been ruled out as long as I can remember but have still managed to stay up year after year) and Norrköping’s title defence. I’d also switch places between Djurgården and Häcken while placing Örebro somewhere in lower mid-table. The promoted sides are hard to predict, but I definitely place Östersund above Jönköpings Södra.

Though I’m pretty happy with the prediction, I’ll update it in another post once a couple of rounds have been played.

A rough prediction of the new Allsvenskan season