Here is Chelsea defending against West Brom in their 2-2 draw this season:
If I’m pointing you to this post from Twitter, it’s likely that you’ve asked, with varying degrees of alarm, what the hell you’re looking at with a chart like above. Because I’m terrible at making legends, here you go:
- This is a chart of how Chelsea defended in the game.
- Each shape is a player, it represents their defensive ‘territory’ – the part of the pitch they made tackles, interceptions, fouls etc.
- The player’s name is written in the centre of their territory, and you should be able to see that some names, and their associated shapes, are bigger or smaller, depending on how much a player ranges around the pitch.
- Each shape has a colour – this represents how much they allowed the opponent to progress through their territory: more green means the player was more of a brick wall, more red means they were more of a sieve.
- Above, you might see that Oscar put in a ton of work and claimed a large territory – we reward players who claim a lot of territory, which is why he’s more green than some of the players he shared space with, even though he let the same opposition moves through.
- Terry did not protect his space particularly well. Mikel and Fabregas provided little in the way of screening, and Matic, who replaced Fabregas, sat very deep but also offered little as they defended their lead.
Just as a quick sanity check on what you see above, WBA’s two goals came from a long shot from a huge empty space in front of Chelsea’s defence (left open by their midfield) and a move on Terry’s side of the penalty box:
Those are cherry-picked and don’t prove much, of course. No chart captures the entirety of a game, but hopefully you see that this is at least an interesting conversation starter to examine where Chelsea might have protected their territory better. Over the course of several games, you may notice the same patterns happening over and over again. At the same time, these are a great first stab at looking for weaknesses in an opponent’s lineup.
And that’s what you’re looking at. How does it work?
A while back I started looking at defence in terms of how a defender prevents their opponents operating in their territory. This included a metric called PATCH (“Possession Adjusted Territorial Control Held”… yeah), which underwent several changes without me really writing it up, despite publishing all sorts of cryptic charts on Twitter. So, my plan today is to go through the whole methodology as it stands today. There’s still work to do, and it’s by no means a hard and fast measure of good and bad defending, but it’s interesting enough to share and hope for some feedback.
PATCH is all about defensive territory – where on the pitch a player is responsible for stopping their opponent. We don’t measure this in an idealised way based on formations or anything like that, all we do is look at where a player is actually defending. We take all their defensive actions and draw a line around them – that’s their territory. In the previous version, we only looked at events in a team’s own half or danger zone, so the system wasn’t great at capturing defensive midfielders, who often defend higher up the pitch. That was a problem, but one we needed to solve without including noise from things like aerial challengers on attacking corners etc. It was also a problem that if a player put in even a single tackle in a weird place (a left back on the right wing etc) then the outline of their territory grew hugely.
There are many ways to solve this, I’ve experimented with a couple. The first was to find the average point of a defender’s defensive actions, and just trim events within 1 standard deviation on the X and Y axes. The advantage of this is that it’s dead simple, very quick to do inside a database query, and the resulting area was still somewhat representative of where the player was on the pitch. But not representative enough: it was possible for players to completely disappear if their defensive actions were all taken in a large ring far enough away from the centre, and it occasionally wrongly accused players of retreating into a tiny territory. Here’s an old version of the Chelsea-WBA chart above, look how tiny everyone is, especially John Terry:
I then experimented with a similar approach using the straight-line distance from the centre within the same sorts of bounds, but really this just gave you a slightly more circular version of the previous box. I finally settled on decent compromise between ease of implementation and realism – I trim events to those within the 70th percentile of distance from the centre. Here’s another example, Tottenham’s 4-1 victory over Sunderland:
The one drawback over the previous version is that things look far busier, especially where there are overlaps, which is why I’ve started putting them on a black background, and increasing the transparency of lower-scoring players (because, you know, sieves are more see-through than brick walls). Departure from brand, I know, but probably more readable.
Future avenues to look at are algorithms like local convex hulls, or more probabilistic approaches. You can certainly use some sort of kernel density approach, although I appreciate having hard boundaries to territory as it is. I might be willing to sacrifice the ease of visualising territory for a better approach, however, and I’ve been looking at a fairly complex system whereby you look at defensive events and opponent buildup in previous (representative) games, and use a Bayesian system to determine the degree to which we think a player would usually be defensively responsible in that situation. I’d love to hear any other approaches people have tried.
The original PATCH metric looked at how many opposition touches a defender allowed in their territory to judge how well they were doing, but this didn’t seem ideal. Some teams with a low block are happy for you to play in front of them to your heart’s content, as long as you don’t make any progress towards goal. Then there are some bad defences that just don’t take many touches to break through and score. So I’ve made a fundamental change here – we now measure ball progression through a defender’s territory. Whenever the ball is passed, or dribbled, or whatever combination of on-the-ball events happens, we look at how much progress the opposition have made towards the defending team’s goal. More than that, we look at the pace with which they’ve moved. Any player whose territory is intersected by the line of this progress gets blamed for it.
So now we’re really measuring something directly relevant – a team moving towards your goal is getting into better and better shooting positions, and preventing, disrupting or postponing this is more or less the core of good defensive work. As ever, it’s not a metric based purely on defensive actions – we still use things like tackles to help mark out a player’s territory, and we hope that there are enough of these events to get an accurate picture. But we’re not judging them on those numbers – we’re judging them in far more direct terms, based on protecting their goal.
As with the previous metric, players are rewarded for the size of their territory , and then penalised for allowing the opposition into it, in this iteration based on ball progression. But the previous scores left me a little uncomfortable, with PATCH regularly recommending bad defences over good ones. I went back and looked in depth at the variables that went into the calculation, and especially the relationships between them.
The first thing I looked at was the possession factor, which was in there to account for the fact that teams without the ball can’t attack you. To be able to compare individual players from high and low possession teams, I normalised things to 50% possession. However it’s not as simple as that, because you might expect high possession teams to have fewer opportunities to make defensive actions, so they’d on average have smaller territories. Rather than scratch my head over it, I just looked at the numbers. It was quickly obvious was that possession really doesn’t have a reliable affect on a player’s territory. More surprisingly, the correlation with ball progression allowed is also extremely low. So, possession’s out. We’ll retcon the acronym later.
I also worried that players with large territories were being overly rewarded, and looked at a couple of different options like taking the root of the area. In the end, if you look at the data, it’s pretty much a linear relationship, but I’ve made the coefficients a little more accurate at least. I also looked at the degree to which minutes on the pitch affected defensive territory, and again, it’s almost impossible to find a reliable correlation. Therefore, only ball progressions is weighted per 90.
So that’s the algorithm – get the area, divide it by ball progression, which you weight per 90 and by pace. The bigger your territory, the better you protect it, the higher you’ll score. It looks a little like this:
(k * Area) ÷ ((Total Ball Progression ÷ Minutes Played x 90) ÷ Average Progression Duration)
That’s the gist anyway.
This is the usual section where I list things I was too lazy to fix, but I promise I’m thinking about them:
- There are better ways to calculate territory, but not necessarily ones that can run inside an SQL query before I get bored.
- Players are blamed for ball progression no matter how much their territory is intersected by an opponent event. Even in the case where they hoof the ball way over your head, you still get blamed. Long term, I’d like to handle special cases like this, and assign degrees of blame to different territories.
- I’m aware that the gaps between territories are interesting – you can defend your territory brilliantly, but still be in the wrong place. Watch this space.
- Lots of goals, frankly, come from mistakes, which aren’t captured here.
- Different positions might want different approaches both to territory and scoring.
It’s also worth point out a few other people working in the same space. Sander at @11tegen11 naturally has a version, with scores based on the number of defensive actions:
Here’s how deep Leicester sat this match.
And how big N’Golo Kanté is for them. pic.twitter.com/iNm0Zafxii
— 11tegen11 (@11tegen11) February 6, 2016
And David Sumpter of @Soccermatics has similar charts looking at just ball recoveries, which is fascinating to study teams’ pressing approaches:
United retook possession a long way forward in last Chelsea match (areas are ‘typical’ positions of ball recovery) pic.twitter.com/Vaz54rV9cO
— Soccermatics (@Soccermatics) February 7, 2016
Happy to hear any other ideas people have!