500 Days of Numbers

I checked today on a whim, and was a little freaked out that it’s exactly 500 days since the first post on this site. That post was a bad joke, foreshadowing much of the work that was to appear on this site, but I thought I might have some leeway to indulge and [record scratch, freeze frame] talk about how I got here.

Out of Gas

In 2015 I found myself utterly burnt out. I’d learned programming at the age of six or seven, and it was something I loved and eventually made a career out of. Lately I’d founded a couple of startups, I even got to pitch at Downing Street one time, but between the exhausting schedules, the disappointments and the guilt of letting down investors and employees alike, I’d arrive at the point where I resented sitting and feeding the machine day after day. No child first stares up at a computer monitor hoping to one day make enterprise middleware. Yet here I was at 35, slightly confused at what had happened in the intervening years.

Instead of packing it all in and making wicker furniture for the rest of my life, I looked around for a hobby to salvage something of the only skill I really had. I toyed with the idea of making games, but I don’t really have the talent. Instead I settled on these here football stats. As I’ve told elsewhere, I was only really aware of football analytics because I followed Ted Knutson back when he wrote about Magic: the Gathering. Eventually StatsBomb happened and I was hooked. In October 2015 I grabbed some data and jumped in.

Shindig

Not a lot of people know this, but there’s a reason this blog has such a silly name. I applied for the 2016 Opta Pro Forum with a deep learning model that would basically [redacted] and [redacted], allowing you to measure [redacted] for any [redacted]. It got rejected, of course, but I was lucky enough to get an invite to the shindig itself and meet many of you for the first time, and ramble long into the night about numbers and hypothetical Netherlands lineups for Euro 2016. I’m grateful I got rejected in a way, because instead of spending that period doing something, y’know, hard, I could spend months just playing with the data. To this day I think this is something we don’t do enough of. Just take the data and cut it in fifty different ways and see how it looks, see what’s possible.

Trash

This approach of playing with data explains pretty much all the work I’ve done in public. What do attacks look like? What does defending look like? All these things derive from mucking about. It’s all entirely well-intentioned, but likely has no analytical value at all. We seem to struggle regularly as a community with this balance between people finding their feet and those expecting fully-fledged science. I feel like my work (and some of the stuff I’ve seen from, for example, David Sumpter), is as much about saying “hey! Look at what you can do with some freakin’ polygons!” as it is about making hard statements of fact. I like to think that some of my work is approachable, and possibly even salvageable for someone to built something cool upon, but I’ll admit that every year Sloan comes around I spend a week in despair that I ever presumed I had anything to offer the field of football analytics. As I pointed out in the last State of the Stats, we’re all vaguely in competition with each other so it’s hard, but to everyone out there experiencing their bout of impostor syndrome, I see you, you’re doing cool stuff!

War Stories

Impostor syndrome in mind, here’s a list of nice things that have happened to me in and around football:

  • Bobby Gardiner was my first every follower, which is nice because I still regularly chat to him and enjoy his work.
  • The Challengers Podcast was the first thing to invite me on to ramble, and you should subscribe to them because they put out an absurd amount of content on a regular schedule.
  • I got to contribute to StatsBomb and I still feel guilty to this day when I put stuff on here while James works his arse off rustling up new work.
  • I repeated myself ramble-for-ramble on the Analytics FC podcast but with added puns.
  • I talked to my first Premier League club and utterly failed to convince them to give me a job.
  • A very nice and extremely patient guy called Jakub Dobias got in touch with me because he was trying to convince Slavia Prague to use analytics.
  • Our first signing (to a tiny degree based on PATCH) went on to win the African Cup of Nations and we don’t really get to take much credit for his incredibly hard work but I am incredibly smug about it as everyone I’ve boasted to knows.
  • I’m getting to travel to Copenhagen to judge a really cool hackathon!
  • I have a full time job in football analytics!

The Train Job

It felt sad when stats poster-boys Brentford/Midtjylland’s ruling boffins SmartOdds disbanded their football analytics department. But it paid off for me when Ted got the itch to get back into the game and started Statsbomb Services. I’ve been responsible for putting together some dead simple tools for clubs to do smart stuff: you’ll be familiar with Ted’s radars, but also stuff like the shot charts and passing maps we can generate at the touch of a button. There’s a lot of clever stuff in the pipeline for the next few months, much of which I expect will end up causing aggro in Ted’s mentions on Twitter, which is the measure of any good analytics work.

Objects in Space

So, that’s where I have managed to get in my brief time in football analytics. Given that it’s just been both the Opta Pro Forum and the Sloan Sports Analytics Conference in the last six weeks, I thought I’d talk a little bit about the future, and given that football’s just objects in space, where we might be headed with tracking data:

  1. My overall attitude to analytics in 2017 is the same as the old William Gibson quote: the future is already here — it’s just not very evenly distributed. US sports (within teams or in partnership with academia) are doing amazing work. It’s very likely that there are football clubs (and certainly betting syndicates) running silent with similarly incredible work, absolutely head and shoulders above what we see in public. Then there are clubs who have some solid spreadsheets to avoid obvious mistakes, followed by vast swathes of clubs doing everything as they already have. Some of the latter have good enough coaches and scouts that it doesn’t matter much at the moment.
  2. Foundational problems such as how to correctly feed tracking data into neural network models are largely unsolved. I think this is where most of the interesting work is happening. At the same time I’d be surprised if any of it was truly digestible inside clubs. There will always be a tension between big, smart, opaque models and small, simple, transparent metrics.
  3. It is dumb that data is still an issue in football. Not just event data but tracking stuff too. My hope is that some of the nascent work turning broadcast quality footage into tracking data with machine learning will one day fundamentally alter the economics of the football data market, because it doesn’t seem like any of the leagues or football associates are going to.

The Message

I’m not gonna lie, at this point this is excruciatingly self-indulgent and I’m just stringing it out so I can reference more Firefly episodes, but the real message here is: I have got everything I could have hoped for out of football analytics, and you can too. It’s a genuinely fun, creative way to use my meagre programming skills, it’s a community of smart and often hilarious people, and it’s apparently even possible to pay your rent doing numbers. The feeling of watching tens of thousands of people, or indeed an entire nation, cheering on someone you had even a small part in moving from one club to another, from one country to another, is utterly thrilling. It’s been the most exciting 500 days in my life, and this is coming from someone that watched Oxford United win the Milk Cup in 1986, the same year I first started mashing the keyboard dreaming of one day making something cool.

500 Days of Numbers

My Stats #8245–8249 & 117

I spend a lot of time working on new models and metrics, watching games and generally mucking about with football stuff, but very little of it sees the light of day. Sometimes I fall out of love with an idea, sometimes it just doesn’t pan out, sometimes I hit the limit of what my brain (or for that matter my free time) can handle and give up.

Today, I’m going to take you on a tour of my drafts folder in WordPress, and as a weird form of primal scream therapy, I’ll give you a sample of some of the ideas I’ve had over the last few months and stalled on. I’ve no idea if this will be in any way useful or inspiring for people, but I hope at the very least that you’ll read it and think, “hey, that guy’s ideas are stupid, I could do this!”

As a festive but belated Easter bonus, I’ve also added a ‘probability of resurrection’ to each idea, so you can see which ones are victims of mere procrastination instead of actual shame.

The Path of Least Resistance

This idea is sort of the intersection of all the shot chart and PATCH stuff I’ve done – can you calculate and plot the areas where a team (or indeed a particular lineup) are weakest? Is it possible to visualise the path of least resistance, along which you’ll find it easiest to progress towards your opponent’s goal?

This isn’t supposed to sound grandiose, or like some universal metric that just tells you how to beat teams, but I genuinely think it would be great to have a visualisation that combined the shot and PATCH charts, to be able to get a feel – at a glance – for where your own team is weak, or where prospective opponents might be weak.

There are a few ways to do this, the first quick attempt I tried was purely visual, plotting big fat lines on a pitch wherever you conceded ball progression, overlaying them, and changing the colour of the overlaps as they get more and more used. This looked almost comically vomitous, so I paused to work on both a better model and visualisation.

Review: promising, until we got to the word ‘vomitous’.
Probability of resurrection: 6/10

Dangerous Dispossessions & Forward Retention

I spent a long time cobbling together stats for an extremely snarky piece about Everton’s ‘Fab Four’ of Barkley, Stones, Deulofeu and Lukaku. The general idea being, each player had bad habits, and we could judge Martinez by the degree to which those habits were being trained out. To be quite honest, three of those players have been fine and/or excellent this season so I cooled on the idea, plus I could never find the exact right metric to test against.

A couple of things I came up with were quite fun though. The first was ‘dangerous dispossessions’. Ross Barkley has spent large swathes of his career dribbling into trouble and losing the ball, and I started watching games with a eye for one thing: how many shots from counter-attacks did Everton conceded when Barkley was dispossessed? The idea being, some players really shouldn’t be dribbling, because they give up more equity than they ever gain. For a second I thought I had him with this, he and Alexis Sanchez featured highly, but after I’d per ninetified everything and used xG instead of raw shots, Barkley stopped sticking out so much.

The second metric I looked at was ‘forward retention’, where you don’t just look at pass completion, you also look at the success of the player you’re passing to, the idea being that some players might play their team-mates into trouble. And then you’ll want to look at whether players are playing passes that are too safe and build a model to allow you to look at the risk vs reward of individual passes etc etc.

Review: better if all this was subsumed into a more general model that looked at events on the pitch and their actual vs likely outcomes.
Probability of resurrection: 3/10

Peak xG

I was thinking about this partly as a way of measuring striker positioning, but also in light of Damien Comolli’s mention of judging defenders by interceptions on the Analytics FC podcast (about 30:52 in). Basically, you can sample along the line of an attempted cross or throughball or whatever, and calculate what could have been the maximum xG for a resulting shot. You can then hope to judge a striker’s positioning by whether they met the ball at its point of peak xG (or if they indeed exceeded it by taking a touch or whatever). You can also hope to judge defenders by measuring how dangerous a shot they prevented through an interception.

Review: probably very simplistic in a world with positioning data, but might be interesting to see a few numbers.
Probability of resurrection: 5/10

Pinball Charts

This was an alternative I imagined to the (rather busy looking, these days) PATCH charts, and part of my frustrated obsession with making charts as animated gifs. The idea was to plot the lines of an opponent’s attacking possession as it moved over the pitch, ‘activating’ defending players’ territories as the lines entered. Territory polygons would start faded out almost completely but become more visible when entered, a bit like a bumper lighting up when hit on a pinball table. If the possession ended in a territory, we’d make it more green (yay, you stopped an attack), if it passed through and out the other side, we’d make it more red (boo, you failed).

I didn’t get very far with this, if only because the graphics library I’ve been using for everything is a little hateful. But I think it would solve a lot of problems with charts that get very busy, and I’m eager to at least see people experiment with whether any useful information can be communicated with animation of this sort of data.

Review: this would probably annoy enough people on Twitter to be worthwhile.
Probability of resurrection: 9/10

Expected Yellows

Clubs are looking for any edge they can get in games, and I would love to build some referee models. The easiest to do with the data that’s out there is expected yellows: given a foul, what is the likelihood of a player being booked for it? Can we find more/less lenient refs, unfairly maligned players versus those immune to punishment, areas of the pitch where it’s safer to put in a professional foul? Could all be interesting, but there’s only about 1000-2000 cards a season depending on the league, and those for a variety of offences, so it’s quite difficult to pin down any patterns with confidence, and that’s before taking into account that the data doesn’t contain how dangerous a particular tackle was.

Expected offsides would be another wonderful model to have if you were intent on destroying the beautiful game at all costs.

Review: not enough data to do a decent job at this stage.
Probability of resurrection: 4/10

Passing Variety

This is one of those weird ones where I’m sure someone already did this, but I may just be misremembering Marek Kwiatowski’s article on Statsbomb about classifying central midfielders. Anyway, what I wanted to do was look at similar metrics to Marek, the pass direction and length, but see which teams had built midfields with a variety of passing styles, as opposed to just the same profiles across the board. Then of course you’d have to look at which approach actually worked better, or whether different lineups enabled teams to handle different opposition better etc. If this sounds familiar to anyone and they know the article I’m talking about, please get in touch so I know I’m not dreaming it.

Review: would be interested to read even if it already exists.
Probability of resurrection: 9/10

Corner Positions

I don’t remember ever making this, but it’s the only one of these that has code which worked first time, so I can actually give you some pictures. what you’re seeing here is players’ aerial performance from corners (straight from corners, whoever wrote this code never bothered to include headers after the first). Size is volume, colour is the ratio won and the centre of each players’ circle is their average position for aerial challenges. Left side of pitch is for corners from the left, right is for right, so picture them coming from the bottom of the screen.

Both Merseyside teams covering themselves with glory here.

Reviewa bit sparse, probably interesting to someone though.
Probability of resurrection: 6/10

CROTCH

This wasn’t actual work per se, but after dropping the Possession Adjusted bit from PATCH, and talking about it on the Analytics FC podcast, it occurred to me that CROTCH would be a magnificent acronym. Control Retained Over Territory something… something. Didn’t pan out, probably for the best.

Reviewno.
Probability of resurrection: 0/10

Conclusion

I’d genuinely forgotten a couple of these until I went through old SQL stored in databases, so it’s been a useful process. By all means take any of the ideas above and run with it (or tell me if you’d desperately like to see it completed). In general I’m happy if you want to replicate anything on the blog as long as you credit me with a little inspiration.

In the meantime, I’ve still got plenty of things sitting in my drafts that I’m actively working on, so I haven’t included those, in the hope that they don’t fall into disrepair also. In fact, I ought to publish this before I forget about it.

My Stats #8245–8249 & 117