As I stated in the previous post, this blog will now focus more on gambling, using Python code to investigate whatever comes to my mind around the subject.
Today I’ll have a look at a classic gambling example – the flip of a coin – but before I go ahead and talk you through the code I want to state a few things that I know some of you will be wondering. Though R seems to be the language preferred by most in the football analytics scene, I have chosen Python simply because I feel it is so much more intuitive and easier to learn. RStudio seems to be the tool of choice for the R folks, but I don’t know of any real dominant counterpart for Python. I use Spyder, available through downloading Anaconda, mainly because it’s easy to use and comes with a lot of useful stuff pre-installed. If you’re thinking about testing it out yourself, I would suggest switching the color scheme of the editor to Zenburn for that dark and cool programming look that really make your code look super important, and run your scripts in the included IPython console.
One final, very important thing: I am not in any way an expert programmer, statistician, mathematician or anything like that. I am simply a gambler looking to use these fields to get an edge. It’s totally OK to simply copy and paste any code I publish here to use yourself and play around with it however you may wish. If you notice any mistakes or if something doesn’t add up, please comment. I’m happy to learn new stuff.
Flipping coins, and the importance of betting at the highest odds
The inspiration for this post came the other day when I noticed that a few hours prior to kick-off in this year’s Super Bowl, the bookmaker Pinnacle offered 1.97 odds on the opening coin flip. A sucker bet, I thought to myself, knowing the true odds of a fair coin to be 2.00. The coin flip is a very popular Super Bowl prop bet though and as it was pointed out to me on Twitter, a few books actually offered the fair odds of 2.00. Choosing the highest odds available is crucial if you want to make money gambling in the long run, so I decided to write up a nice little Python script to visualise my point.
The layout of these blog posts will be that I simply throw a piece of code at you, before explaining it. The comments in the code itself should also help you out, and for those of you who already know Python much will be simple basics, while those who’s completely new to coding or Python will hopefully learn a few things.
Here we go:
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns def coin_flips(n=10000,odds=1.97): ''' Simulates 10000 coinflips for a single punter, betting at 1.97 odds, also calculates net winnings ''' # create a pandas dataframe for storing coin flip results # and calculate net winnings df = pd.DataFrame() # insert n number of coinflips, 0=loss, 1=win df['result'] = np.random.randint(2,size=n) # calculate net winnings df['net'] = np.where(df['result']==1,odds-1,-1) # calculate cumulative net winnings df['cum_net'] = df['net'].cumsum() return df
Allright, so after importing all the needed modules for this piece, we go ahead and define our first function, coin_flips, which will be used to simulate the coin flips and calculate the net winnings of a single punter. I’ve chosen 10,000 flips and Pinnacle’s odds of 1.97 as our default values here.
Creating a pandas dataframe, we can easily store the result of each coin flip. Now, as we assume that the coin is fair, there’s no need to even consider which side our punter would call each time, instead we can simply go ahead and use numpy to simulate a series of ones and zeros, representing either a win or a loss. Calculating the net result of each flip is also very straightforward as when he wins, our punter will pocket the net end of the offered odds, 0.97, while losing will see his pocket lightened by 1 unit. Calculating the cumulative net winnings is also very easy using pandas’ built-in cumsum function.
For coding reasons, the function is set to return the dataframe so calling it will simply make a lot of numbers pop up, but running the coin_flips()[‘cum_net’].plot() command in the IPython console will let you simulate a punter’s coin flips, and also plot his cumulative net winnings like this:
Every time you run the command another simulation will run with a new, different result. Doing this a couple of times, you’ll likely understand why I described this as a sucker bet. Sure, you can get lucky and win, even a couple of times in a row – but betting with the odds against you, you’ll find it very hard to make a profit long term.
But that single punter flipping coins 10,000 times actually doesn’t say that much, maybe he just got unlucky? To dig deeper we want to know just how likely you are to end up with a profit after 10,000 coin flips. So we write another function, using the previous one to simulate the results of many more punters betting on 10,000 coin flips. How many do you think will end up in profit?
def many_coin_flips(punters=100,n=10000,odds=1.97,color='r'): ''' Simulates 10000 coinflips for 100 different punters, all betting at 1.97 odds, also calculates and plots net winnings for each punter ''' # create pandas dataframe for storing punter results punter_df = pd.DataFrame() # loop through all punters for i in np.arange(punters): # simulate coin flips df = coin_flips(n,odds) # calculate net net = df['net'].sum() # append to our punter dataframe punter_df = punter_df.append({'odds':odds, 'net':net},ignore_index=True) # plot the cumulative winnings over time df['cum_net'].plot(color=color,alpha=0.1) # check if punters ended up in profit punter_df['winning'] = np.where(punter_df['net']>0,1,0) return punter_df
The slightly more complicated many_coin_flips function uses the earlier coin_flips to loop through a group of punters, 100 by default, and save their results into a new pandas dataframe, punter_df, where we’ll assign a 1 to all punters who ended up in profit while all the losers get a 0. We also plot each punters cumulative net winnings with a nice red color to symbolise their (very) likely bankruptcy.
This function also returns a dataframe so running it will again make a lot of numbers pop up in the console, but it also plots out the financial fate of each punter, like this:
As we can see, there actually are a few of our 100 punters who got lucky enough to end up winning after 10,000 coin flips. But most of them ended up way below the break-even point, losing a lof of money. If this was a real group of punters we can only hope that even if they were stupid enough to set out betting on 10,000 coin flips at these odds, they’ll at least at some point realise their mistake and quit.
But how about if we change the offered odds? As I mentioned earlier, some books actually put up the fair odds of 2.00. How would 100 punters do after 10,000 coin flips betting at those odds? Well, we’ll have to write a new function for that. Also, just for fun (or to make a point) I’ve included an additional group of 100 punters lucky enough to be allowed to bet on the coin flips at odds of 2.03 – literally a license to print money.
def compare_odds(punters=100,n=10000,odds=[1.97,2.00,2.03]): ''' Simulates and compare coin flip net winnings after 10000 flips for 3 groups of punters, betting at odds of 1.97, 2.00 and 2.03, respectively. Also plots every punters net winnings ''' # create figure and ax objects to plot on fig, ax = plt.subplots() # set y coordinates for annotating text for each group of punters ys = [0.25,0.5,0.75] # assign colors to each group of punters cs = ['r','y','g'] # loop through the groups of punters, with their respective odds, # chosen color and y for annotating text for odd, color, y in zip(odds,cs,ys): # run coin flip simulation with given odds, plot with chosen color df = many_coin_flips(punters,n,odd,color) # calculate how many punters in the group ended up in profit winning_punters = df['winning'].mean() # set a text to annotate win_text = '%.2f: %.0f%%' %(odd,winning_punters * 100) # annotate odds and chance of profit for each group of punters ax.annotate(win_text,xy=(1.02,y), xycoords='axes fraction', color=color,va='center') # set title ax.set_title('Chances of ending up in profit after %s coin flips' %n) # set x and y axis labels ax.set_xlabel('Number of flips') ax.set_ylabel('Net profit') # add annotation 'legend' ax.annotate('odds: chance',xy=(1.02,1.0), xycoords=('axes fraction'),fontsize=10,va='center') # add horizontal line at breakeven point plt.axhline(color='k',alpha=0.5) # set y axis range at some nice number ax.set_ylim(-450,450) # show plot plt.show()
This last function makes use of the two previous ones to simulate the coin flips of our three groups of punters, plotting their total net winnings all on the same ax object, which we later make use of to add a title and some nice labels to the axes. We also add a horizontal line to be able to better compare the punters’ winnings with the break-even point, as well as some text annotation to explain the colors of the three groups.
Now, running the compare_odds() function in the IPython console will hopefully result in something like this:
Here we clearly see just how important betting at the highest odds really is. Have in mind though that the numbers to the right are only rough estimates. As you can see, the yellow group of punters who bet at the fair odds of 2.00 did not win exactly 50% of the time, but close enough. I actually had to re-run the function a few times to get this close. But it’s only natural since we only had 100 punters, a very small number in this context, in each of our groups. The more punters and coin flips we use in our simulations, the closer we’ll come to the real win percentages – but here speed is more important than super accuracy.
So as we clearly see in the above plot, betting on the coin flip at Pinnacle’s 1.97 odds really is a sucker bet, albeit an entertaining one if you were planning to watch the Super Bowl. But if you hope to make a profit from your betting, finding the highest available odds to bet on is crucial, as is shown by the green group of punters who were allowed to bet at odds of 2.03. It’s only a difference of 0.06, but it makes all the difference in the long run. The margins in betting are tiny, but they add up over time.
The lessons learned here can easily be transferred to sports betting in general and football betting in particular, were the Asian Handicaps and Over/Under markets focus on odds around even money. The coin flip example is special though as we knew the true odds of the bet beforehand, something you’ll never be able to know betting on football. But as shown in the last plot, by consistently betting at the highest available odds, you at least give yourself a much better chance of ending up in profit.
Hi Zorba!
First, kudos on your new gig! (Though I’m bemused to hear you’re working for the books: you’re leaving money on the table, but I understand choosing stability.) I haunted your blog all year and had a great time reading it: there are so few numerate blogs on sports betting, and having a new Allsvenskan model of my own to pace against yours was a bonus. Looking forward to taking you on next year across the counter.
I wanted to leave your readers a few R functions I wrote that do much the same job as your code above, for the sake of diversity. (I have some other variance simulators too, but I’ve used the ones below for a while to keep myself sane during hellacious downswings.) Some of these depend on the others, as you can see in the code, so I keep them all in the same working environment.
This one simulates the flip of an unfair coin, where p is the probability of heads.
flipCoin <- function(p){
coin <- runif(1, min=0, max=1)
ifelse(coin < p, 1, 0)
}
If you want to simulate x unfair coins, then of course we use replicate.
flipCoins <- function(x, p){replicate(x, flipCoin(p))}
We're going to want to be able to bet these, so we need a function that gives us the Kelly stake for each edge. Here b is our bankroll, p is our probability of winning, z is Hong Kong odds, or (decimal odds – 1), and k is our desired Kelly fraction.
fk <- function(b,p,z,k){b * ((1 – (p*z/(1-p))^(-k)) / (1 + z*(p*z / (1-p))^(-k)))}
So if we have a bankroll of $1000, a bet that will win 55% of the time, decimal odds of 1.88, and enough confidence in our p value to bet 2/3 of Kelly, our function is fk(1000, .55, .88, 2/3), and returns the amount of money we should stake on our bet, ≈ 25.78$.
I use half Kelly so often that I wrote a shortcut function for that. It also takes out the bankroll value and simply returns the percentage of your roll that you should stake. (z is still Hong Kong, not Euro.)
fkez <- function(p, z){fk(100, p, z, .5)}
So if we have our 55% bet and decimal odds of 1.88, we run fkez(.55, .88), and are returned the percentage we should stake, ≈ 1.93%.
Now we can do something fun. Let's say that we're given x opportunities to flip our unfair coin that comes up heads with probability p, and we're always offered a line of z, where z = (decimal odds – 1). Betting our edge half Kelly, our results for the season can be found as follows:
seasonFlip <- function(x, p, z){
y = sum(FlipCoins(x, p))
print((1 + .01*fkez(p, z)*z)^y * (1 – .01*fkez(p, z))^(x – y))
}
Say I know I'll be able to bet 100 lines in Allsvenskan this year, each with p = .55 of winning and decimal odds of 1.88, I can run seasonFlip(100, .55, .88) to find some fractions by which my initial bankroll will be multiplied. When I ran this five times just now, I got ≈ 1.26, 1.51, 1.01, 1.05, and 0.88, respectively: my nets for the five simulated years were gains of 26%, 51%, 1%, and 5%, and -12%. It's not at all unreasonable to have a losing season (or a stressful breakeven) even with a nice edge over 100 bets! (If I had the fifth season, I might apply for a job with the bookies myself! ^_^)
LikeLike
Whoops, that last one should be
seasonFlip <- function(x, p, z){
y = sum(flipCoins(x, p))
print((1 + .01*fkez(p, z)*z)^y * (1 – .01*fkez(p, z))^(x – y))
}
with a lower case f in the camelback flipCoins! (Not sure how that uppercase F snuck in.)
LikeLike
Hey MV! Great stuff, thanks a lot!
Yeah, I get what you say about being on the ‘wrong’ side of the business but all I can say is that there are situations in life when you have to be really risk averse, and not for purely financial reasons. Stability is key for me and those around me right now. But as a matter of fact, I’m looking to bet more this year than I did the last, when I mostly coded and wrote. And now as a bonus, with a paycheck the swings will be so much easier to deal with.
Funny how you should mention Kelly, I was just putting a follow-up piece together in my head on how to visualise the risk of ruin for different staking plans like full and different fractions of Kelly.
LikeLike
No prob! If you’re up to correspond about this, let me know, maybe we can chat a bit more.
Not sure if you’ve already seen it, but for me the seminal paper on Kelly betting is Thorp’s:
Click to access Thorpe_KellyCriterion2007.pdf
It has a lot on the risk of ruin, and the relative prognoses of each Kelly fraction. If there’s anything I’ve learned from modeling and tracking many different sports, it’s that the pro bettors aren’t kidding when they call full Kelly (or even half Kelly) a rough ride: advantage or not, you WILL have 50%+ downswings and 100%+ upswings quite often. (That coin flipper in your post is a great example: the guy has no edge, but manages to win unabated from about flip #5500 to flip #7000.) Full Kelly carries such a high risk of overstaking that I can’t imagine ever using it in sports betting.
Also important when you’re betting something like football with so many fixtures is Kelly staking on simultaneous events, since a busy Saturday will so often leave you needing to stake more than 100% of your bankroll. Andrew Grant has a fine chapter (#19) on the topic in the Oxford Handbook on the Economics of Gambling:
https://books.google.com/books?id=-Wa-AAAAQBAJ&pg=PA369&dq=kelly+simultaneous+events&hl=en&sa=X&ved=0ahUKEwii0OO63ojMAhVmsYMKHTCjCxwQ6AEIHDAA#v=onepage&q=simultaneous&f=false
LikeLike
Sorry if this shows up as a doublepost: I tried to post something a moment ago, but my cookie jar was closed and the post might have vanished into the ether. (Luckily that’ll make me a bit more succinct on this go round.)
You’ve probably already read it, but on the offchance that you haven’t, the seminal paper on Kelly betting has to be Thorp’s: http://www.eecs.harvard.edu/cs286r/courses/fall12/papers/Thorpe_KellyCriterion2007.pdf
There’s a lot of gold in there on the risk of ruin and the relative merits of each Kelly fraction. Anyone out there betting football will also want to read up on simultaneous betting, since a busy Saturday will so often see you tasked with staking more than 100% of your bankroll. Andrew Grant has a fantastic chapter (#19) on the topic in the Oxford Handbook of the Economics of Gambling: bit.ly/2lIFuHn
That should be enough armor to soldier through a few vicious downswings (which, with full Kelly, are so nasty that I can’t imagine ever using it in sports betting: the rest of overstaking is intolerably high). The pro bettors aren’t kidding when they call Kelly a rough ride. Advantage or not, you will have 75% downswings and 400% upswings – just look at that coin flipper in your post! The guy is drawing dead, yet still manages to win unabated from bet #5500 to bet #7000.
LikeLike
[…] Flipping coins, and… on With a new season approaching,… […]
LikeLike
[…] is an example taken from a blogpost Flipping coins and the importance of betting at the highest odds by […]
LikeLike