PATCHing Teams

I explained the current PATCH methodology in my previous post. Today I’m going to do a deep dive into how PATCH views the current teams in the EPL. Here’s what the table looks like (pre-Southampton on Saturday):

Team PATCH
Chelsea 2.84
Manchester United 2.79
Liverpool 2.73
Tottenham Hotspur 2.70
Manchester City 2.68
Bournemouth 2.64
Arsenal 2.61
Southampton 2.45
Leicester City 2.38
Aston Villa 2.34
Norwich City 2.28
Watford 2.27
Palace 2.27
West Ham United 2.27
Everton 2.22
Swansea City 2.18
Stoke City 2.13
West Bromwich Albion 2.11
Newcastle United 1.97
Sunderland 1.88

The values for PATCH are the 60th percentile of all performances for each team. You could, if you were highly motivated, work out the actual units for PATCH, but treat it as abstract. 2 is around average, somewhere just under 4 is the 90th percentile amongst player performances and 5+ would be outstanding. Given that, I am reasonably happy with how this shapes up.

PATCH Correlations

At first glance the numbers above don’t look bonkers, but how does the metric correlate with other team defensive stats? Let’s have a look:

Percentile GA xGA Shots Against SoT Against
10th 0.06 0.10 0.08 0.06
20th 0.22 0.30 0.20 0.15
30th 0.28 0.44 0.38 0.30
40th 0.29 0.56 0.54 0.44
50th 0.27 0.61 0.63 0.45
60th 0.27 0.71 0.76 0.55
70th 0.19 0.65 0.71 0.48
80th 0.16 0.60 0.62 0.38
90th 0.10 0.47 0.50 0.29

Those are the R2 values each team’s PATCH values at a certain percentile (10th being the lowest 10%, i.e. worst defensive performances), compared to some traditional measures. It’s great to see that we’re nicely correlated with expected goals against and shots, though I should point out that shots do directly go into the calculation – if you allow a shot through your territory, it’s marked against you. However, that’s only a small proportion of the gains measured. I tested with shots removed from the ball progression metric just to be sure and the correlations barely went down.

Defensive Ranks

So far we’ve only looked at team’s performances en masse, as measured by PATCH. This is what things look like if we break them down by rank in a team’s formation:

defensive-ranks

There are a few interesting patterns that immediately jump out:

  • Bournemouth’s attacking midfielders and forwards are doing a bunch of defensive work.
  • Manchester City’s less so.
  • Tottenham have the least penetrable midfield of any team in the league.
  • As you might expect, Leicester’s attack and defence are a little more robust than their midfield, reflecting the fact that they press high then retreat low.

Lingering on these numbers a little longer, I thought I’d compare these numbers to someone else’s model for a further sanity check. Mark Thompson of Every Team Needs a Ron is one of my favourite writers, and is devoted to studying defenders in all their forms. He has a system to analyse how teams convert possessions into attacks, and attacks into shots, and how they allow their opponents to do the same. I compared the defensive rank data above with his data to see what the correlations were:

Defence Midfield Attack
Attacks per Possession 0.65 0.47 0.40
Shots per Attack 0.46 0.53 0.38

So, comparing the Attack, Midfield, Defence PATCH values from the graph above to Mark’s Attacks per Possession and Shots per Attack, we can get an idea of how much different parts of a team contribute to breaking up attacks. Defensive PATCH values explain 65% of the variance in opponent attacks per possession, whereas midfield is a much lower 47%. This makes some sense, while a lot of teams would love their midfield to quash potential attacks before they happen, it’s far more common that they make it throught to the last line. What’s interesting is the second row, where midfield performances explain shots pre attack better than defence. Again I wonder if this is bad shot quality – the defence don’t (and often don’t want to) stop low-expectation long shots. However if your midfield are putting in a good screening performance, attackers won’t even get the space for bad shots.

That’s one explanation, anyway. At the very least I’m happy to see a decent correlation with someone else’s model.

Patchwork Defences

Defences are more than the sum of their parts. There are plenty of games where teams in aggregate can put in a great performance in terms of total or average PATCH values, but still be torn apart on the field. This happens often because of mistakes, which PATCH will probably never be able to account for, but it also happens because of weak links that let down the greater whole. Have a look at Manchester City from this weekend’s absolutely definitive title decider against Leicester:

defensive-areas-1901664770408076

This is a fairly green chart – City policed a lot of territory, and in various parts of the pitch prevented Leicester from making regular gains. But look at their right-hand side: Otamendi didn’t score especially highly, and Zabaleta (who seems to be pushing quite far forward) scored even worse. Teams rightly fear Leicester’s right wing, because that’s where the formidable Mahrez nominally takes the field, but here we saw Mahrez pop up on the left a few times, including for Leicester’s 2nd goal, and Drinkwater also made some penetrative passes. We can see this from Leicester’s attacking chart for the day:

LEI-MCI

Very left leaning, basically nothing on the right. Despite the fact that City conceded twice from set-pieces, you still saw scare after scare from open play. The combination of a weak defensive right-hand side, and players taking higher positions than was perhaps advisable against the league’s fastest counter-attacking team (still 2nd in Europe after Caen), meant that good PATCH scores in many parts of the pitch did not necessarily add up to a good defensive performance.

Weak Links

Given what we saw in the Man City vs Leicester game, perhaps we should judge a defence by its weakest link? After all, if they’re allowing lots of ball progression in their area, that’s obviously where the opposition are attacking, whether or not they’re thinking of that player as exploitable. If we just look at the lowest score for a defender in each game (using just those with 90+ minutes in a game to be safe), this is what teams come out looking like:

Team Mean Weak Link PATCH
Chelsea 2.32
Manchester United 2.06
Arsenal 1.99
Manchester City 1.97
Liverpool 1.78
Aston Villa 1.77
Leicester City 1.74
Tottenham Hotspur 1.74
Southampton 1.74
Bournemouth 1.68
Swansea City 1.62
Norwich City 1.61
Crystal Palace 1.59
West Bromwich Albion 1.58
Watford 1.54
Everton 1.53
West Ham United 1.48
Stoke City 1.48
Newcastle United 1.45
Sunderland 1.45

Nothing radically different here, perhaps I should be a little uncomfortable seeing Villa that high, but they have save percentage and shot creation issues, not necessarily an awful defence. That said, these numbers correlate less well with each of the four measures we compared to earlier, so it seems less representative.

Total Territory

PATCH fundamentally rewards defenders for claiming territory, so lets look into any team characteristics can we pick up from looking at their territory as a whole. Who uses the most space? Who leaves the most gaps?

This is total area per game of players’ defensive territory for each team, measured first as the sum of individual areas, then as a merged team area:

Team Total Individual Area Team Area
Arsenal 12375 5286
Aston Villa 12338 5590
Bournemouth 12487 5540
Chelsea 13385 5612
Crystal Palace 14240 5750
Everton 10494 5181
Leicester City 14580 6078
Liverpool 12817 5626
Manchester City 12025 5405
Manchester United 13156 5438
Newcastle United 11767 5003
Norwich City 11949 5614
Southampton 12885 5525
Stoke City 11464 5347
Sunderland 12880 5560
Swansea City 10882 5052
Tottenham Hotspur 13520 5522
Watford 12482 5779
West Bromwich Albion 12943 5694
West Ham United 14022 5693

Which looks like this:

territorial-area

The X axis is the total individual area, which includes overlaps between players. The Y axis is the team shape, the area you get when you merge all the individual territories together and forget overlaps – also worth noting that the lower this value, the more empty spaces the team is leaving on the pitch.

It’s interesting because it reveals teams that are quite expansive in their defensive efforts (to the right are basically the pressers and aggressors, to the left is… Everton, asking very little of its defence). It also shows teams that have an overall compact defensive shape (Newcastle) versus those that are push up more (Leicester, Watford). Above the trend line are teams with less overlap, below are those that are more crowded when defending.

If we apply a similar sort of calculation to PATCH, we can take a team’s area and judge them not by the progression they allow through their territory, but by the progression that happens outside it. If we do that, these are the numbers we see:

Team Outside Territory PATCH
Manchester City 24.09
Liverpool 23.55
Norwich City 22.78
Southampton 22.04
Leicester City 20.77
Watford 20.75
Tottenham Hotspur 20.59
Aston Villa 20.22
West Bromwich Albion 20.19
Crystal Palace 19.37
Bournemouth 19.35
West Ham United 19.26
Chelsea 18.34
Manchester United 17.94
Arsenal 16.98
Swansea City 16.51
Stoke City 16.36
Sunderland 15.96
Everton 13.31
Newcastle United 11.28

So Man City, Liverpool and… Norwich (apparently) allow the least progression outside their territory. Newcastle and Everton leave the biggest gaps for opponents to operate inside.

Getting Goal Side

Above you saw how a lot goes on in empty spaces. The thing that worries me most about PATCH, and particularly the approach I’ve taken to trimming events for territory, is space behind a defender. Perhaps we should leave in all goal side events for a defender? Even more, should we project their territory back to the goal line, or even the goalmouth itself?

Well, you’re going to have to wait to find out. In my next post I’m going to finally get around to looking at some individual player scores, and I’ll experiment with how defenders should be blamed for goal side events then.

PATCHing Teams

Defending your PATCH

Here is Chelsea defending against West Brom in their 2-2 draw this season:

defensive-areas-1840762271078623

If I’m pointing you to this post from Twitter, it’s likely that you’ve asked, with varying degrees of alarm, what the hell you’re looking at with a chart like above. Because I’m terrible at making legends, here you go:

  • This is a chart of how Chelsea defended in the game.
  • Each shape is a player, it represents their defensive ‘territory’ – the part of the pitch they made tackles, interceptions, fouls etc.
  • The player’s name is written in the centre of their territory, and you should be able to see that some names, and their associated shapes, are bigger or smaller, depending on how much a player ranges around the pitch.
  • Each shape has a colour – this represents how much they allowed the opponent to progress through their territory: more green means the player was more of a brick wall, more red means they were more of a sieve.
  • Above, you might see that Oscar put in a ton of work and claimed a large territory – we reward players who claim a lot of territory, which is why he’s more green than some of the players he shared space with, even though he let the same opposition moves through.
  • Terry did not protect his space particularly well. Mikel and Fabregas provided little in the way of screening, and Matic, who replaced Fabregas, sat very deep but also offered little as they defended their lead.

Just as a quick sanity check on what you see above, WBA’s two goals came from a long shot from a huge empty space in front of Chelsea’s defence (left open by their midfield) and a move on Terry’s side of the penalty box:

Those are cherry-picked and don’t prove much, of course. No chart captures the entirety of a game, but hopefully you see that this is at least an interesting conversation starter to examine where Chelsea might have protected their territory better. Over the course of several games, you may notice the same patterns happening over and over again. At the same time, these are a great first stab at looking for weaknesses in an opponent’s lineup.

And that’s what you’re looking at. How does it work?

PATCH

A while back I started looking at defence in terms of how a defender prevents their opponents operating in their territory. This included a metric called PATCH (“Possession Adjusted Territorial Control Held”… yeah), which underwent several changes without me really writing it up, despite publishing all sorts of cryptic charts on Twitter. So, my plan today is to go through the whole methodology as it stands today. There’s still work to do, and it’s by no means a hard and fast measure of good and bad defending, but it’s interesting enough to share and hope for some feedback.

Defensive Territory

PATCH is all about defensive territory – where on the pitch a player is responsible for stopping their opponent. We don’t measure this in an idealised way based on formations or anything like that, all we do is look at where a player is actually defending. We take all their defensive actions and draw a line around them – that’s their territory. In the previous version, we only looked at events in a team’s own half or danger zone, so the system wasn’t great at capturing defensive midfielders, who often defend higher up the pitch. That was a problem, but one we needed to solve without including noise from things like aerial challengers on attacking corners etc. It was also a problem that if a player put in even a single tackle in a weird place (a left back on the right wing etc) then the outline of their territory grew hugely.

There are many ways to solve this, I’ve experimented with a couple. The first was to find the average point of a defender’s defensive actions, and just trim events within 1 standard deviation on the X and Y axes. The advantage of this is that it’s dead simple, very quick to do inside a database query, and the resulting area was still somewhat representative of where the player was on the pitch. But not representative enough: it was possible for players to completely disappear if their defensive actions were all taken in a large ring far enough away from the centre, and it occasionally wrongly accused players of retreating into a tiny territory. Here’s an old version of the Chelsea-WBA chart above, look how tiny everyone is, especially John Terry:

terry-bug

I then experimented with a similar approach using the straight-line distance from the centre within the same sorts of bounds, but really this just gave you a slightly more circular version of the previous box. I finally settled on decent compromise between ease of implementation and realism – I trim events to those within the 70th percentile of distance from the centre. Here’s another example, Tottenham’s 4-1 victory over Sunderland:

defensive-areas-1835048780325164.png

The one drawback over the previous version is that things look far busier, especially where there are overlaps, which is why I’ve started putting them on a black background, and increasing the transparency of lower-scoring players (because, you know, sieves are more see-through than brick walls). Departure from brand, I know, but probably more readable.

Future avenues to look at are algorithms like local convex hulls, or more probabilistic approaches. You can certainly use some sort of kernel density approach, although I appreciate having hard boundaries to territory as it is. I might be willing to sacrifice the ease of visualising territory for a better approach, however, and I’ve been looking at a fairly complex system whereby you look at defensive events and opponent buildup in previous (representative) games, and use a Bayesian system to determine the degree to which we think a player would usually be defensively responsible in that situation. I’d love to hear any other approaches people have tried.

Ball Progression

The original PATCH metric looked at how many opposition touches a defender allowed in their territory to judge how well they were doing, but this didn’t seem ideal. Some teams with a low block are happy for you to play in front of them to your heart’s content, as long as you don’t make any progress towards goal. Then there are some bad defences that just don’t take many touches to break through and score. So I’ve made a fundamental change here – we now measure ball progression through a defender’s territory. Whenever the ball is passed, or dribbled, or whatever combination of on-the-ball events happens, we look at how much progress the opposition have made towards the defending team’s goal. More than that, we look at the pace with which they’ve moved. Any player whose territory is intersected by the line of this progress gets blamed for it.

So now we’re really measuring something directly relevant – a team moving towards your goal is getting into better and better shooting positions, and preventing, disrupting or postponing this is more or less the core of good defensive work. As ever, it’s not a metric based purely on defensive actions – we still use things like tackles to help mark out a player’s territory, and we hope that there are enough of these events to get an accurate picture. But we’re not judging them on those numbers – we’re judging them in far more direct terms, based on protecting their goal.

Scoring

As with the previous metric, players are rewarded for the size of their territory , and then penalised for allowing the opposition into it, in this iteration based on ball progression. But the previous scores left me a little uncomfortable, with PATCH regularly recommending bad defences over good ones. I went back and looked in depth at the variables that went into the calculation, and especially the relationships between them.

The first thing I looked at was the possession factor, which was in there to account for the fact that teams without the ball can’t attack you. To be able to compare individual players from high and low possession teams, I normalised things to 50% possession. However it’s not as simple as that, because you might expect high possession teams to have fewer opportunities to make defensive actions, so they’d on average have smaller territories. Rather than scratch my head over it, I just looked at the numbers. It was quickly obvious was that possession really doesn’t have a reliable affect on a player’s territory. More surprisingly, the correlation with ball progression allowed is also extremely low. So, possession’s out. We’ll retcon the acronym later.

I also worried that players with large territories were being overly rewarded, and looked at a couple of different options like taking the root of the area. In the end, if you look at the data, it’s pretty much a linear relationship, but I’ve made the coefficients a little more accurate at least. I also looked at the degree to which minutes on the pitch affected defensive territory, and again, it’s almost impossible to find a reliable correlation. Therefore, only ball progressions is weighted per 90.

So that’s the algorithm – get the area, divide it by ball progression, which you weight per 90 and by pace. The bigger your territory, the better you protect it, the higher you’ll score. It looks a little like this:

(k * Area) ÷ ((Total Ball Progression ÷ Minutes Played x 90) ÷ Average Progression Duration)

That’s the gist anyway.

Caveats

This is the usual section where I list things I was too lazy to fix, but I promise I’m thinking about them:

  • There are better ways to calculate territory, but not necessarily ones that can run inside an SQL query before I get bored.
  • Players are blamed for ball progression no matter how much their territory is intersected by an opponent event. Even in the case where they hoof the ball way over your head, you still get blamed. Long term, I’d like to handle special cases like this, and assign degrees of blame to different territories.
  • I’m aware that the gaps between territories are interesting – you can defend your territory brilliantly, but still be in the wrong place. Watch this space.
  • Lots of goals, frankly, come from mistakes, which aren’t captured here.
  • Different positions might want different approaches both to territory and scoring.

It’s also worth point out a few other people working in the same space. Sander at @11tegen11 naturally has a version, with scores based on the number of defensive actions:

And David Sumpter of @Soccermatics has similar charts looking at just ball recoveries, which is fascinating to study teams’ pressing approaches:

Happy to hear any other ideas people have!

Defending your PATCH

Gary Neville’s Red Wedding

As an Evertonian, I was fascinated when Moyes went to Real Sociedad. The great Howard Kendall had enjoyed a wonderful spell in the Basque country, and Moyes seemed to start off comfortably enough. In Liverpool he won over the fans with his throwaway “People’s Club” comment, in San Sebastián all he had to do was eat some crisps.

It wasn’t to be a wildly successful stint for Moyes – he beat relegation and little else. But I kept an eye out for his results, if only because it took balls to take a job in Spain (and stay there) when Premier League clubs were calling. When Monday Night Football’s touch-screen savant Gary Neville was offered a Valencia job he had neither earned nor dreamt of, I was similarly impressed that he took the chance, but I felt even more intrigued – how could it be possible to learn on the job at a club of such magnitude?

Last night’s 7-0 massacre at the hands of Barcelona may prove to be Gary Neville’s Red Wedding moment, the young prince crowned too young and unprepared, fatally outmanoeuvred with murderous efficiency by his more experienced enemies. But at least Robb Stark won some battles along the way – what has Neville done? Valencia are winless in the league, and 14 points in 8 games off their results in the same games last season – almost 1.5 points per game lower than their previous pace. Nuno Espírito Santo led them to 10 fewer points in his 13 league fixtures compared to last season, 0.75 points per game off the pace. So you could argue things have got twice as bad under Neville, including elimination from the Champions League, and now the singular bright spot of the Copa del Rey all but extinguished.

Even before last night, it’s been slightly painful to watch at times – Neville wasn’t just an insightful pundit, he was also clear about what kind of football he might want a team under his tutelage to play, and who he hoped to emulate. He has made no secret of his admiration for Mauro Pochettino, and clearly hoped to emulate his high-pressure, high-energy approach. It’s possible there was a mix-up with the tapes though, because watching his first game against Lyon in the Champions League was that his team brought more to mind Pochettino’s predecessor André Villas-Boas – time and again they were caught high, as Lyon countered again and again, carving open the defence of one of England’s most capped defenders.

This can be forgiven – a style that relies on pressing high up the pitch takes time to develop, and Pochettino has been given that time at Tottenham. There is no doubt that’s it’s paying dividends, as pointed out by Colin Trainor recently:

But with Neville engaged in six month audition, and Valencia only five points clear of relegation at this stage, has he had any success in moulding his young squad in his image? What are the hallmarks of Neville’s time at Valencia?

  • Is he defending high? Neville’s team are performing defensive actions less than 1% further up the pitch than Nuno’s (35.1% vs 34.4%) a difference which is nullified if you include the 2014 season.
  • Is he pressing more? Valencia have gone from 5.2 passes per defensive action to 5.5 under Neville, indicating less pressing.
  • Is their tempo higher? Attacking pace has gone from about 3.4m/s to 3.6 m/s.
  • Has he perfected the wing play that Valencia want and expect from a Ferguson acolyte? Nope, same number of crosses per game on average (about 23), key passes slightly narrower if anything. He’s added a couple of successful dribbles per game, but having watched them, you’d expect that, as they rarely create any sort of overloads to offer a passing outlet.

 

I’ve watched them several times, and I’ll admit I am finding it hard to put a finger on what philosophy Neville has actually brought to Valencia. I asked on Twitter and nobody else seemed to have much of a clue either. Euan McTear wrote a decent piece looking at their numbers and some of Neville’s personnel changes, so I’m reluctant to go into much more depth in hope of finding answers, beyond the obvious fact that they’ve been a bit rubbish.

Rubbish but unlucky? On the face of it, expected goals doesn’t help the picture: I have them about -2.25 in expected goal difference during Neville’s stint, -0.95 under Nuno. However, it’s certainly fair to point out that Neville’s Valencia have been singularly unable to carve open a lead in the league, and perhaps this skews everything. To look into this, I ran 10,000 simulations of the shots from each of his games looking at the winner, but also the first scorer:

Home Away Home Score Away Score Home xG Away xG Home Win % Draw % Away Win % Home Scores 1st Away Scores 1st
Valencia CF Sporting de Gijón 0 1 2.07 1.26 56% 25% 19% 76% 23%
Deportivo de La Coruña Valencia CF 1 1 0.62 0.77 36% 38% 26% 38% 39%
Valencia CF Rayo Vallecano 2 2 1.14 1.72 25% 24% 51% 20% 76%
Real Sociedad Valencia CF 2 0 2.88 0.95 79% 13% 8% 62% 36%
Valencia CF Real Madrid 2 2 2.48 1.50 62% 20% 18% 43% 56%
Villarreal Valencia CF 1 0 0.43 0.66 20% 43% 37% 30% 38%
Valencia CF Getafe 2 2 1.04 0.71 42% 34% 24% 58% 27%
Eibar Valencia CF 1 1 2.78 0.53 89% 9% 3% 96% 3%
Valencia CF Lyon 0 2 1.19 1.26 38% 29% 34% 38% 55%

Note: the ‘score 1st’ columns don’t necessarily add up to 100% because of the possibility of nil-nil draws.

They conceded late – twice – to Real Sociedad, but deservedly so. They certainly could have beaten Real Madrid, the Villareal result seems cruel, and perhaps a better result against Getafe was possible. And then last weekend, the game against Sporting Gijón was notable mostly for Negredo’s series of increasingly spectacular misses.

You would have expected them to nip the first goal somewhere along the line here, and it’s possible at that point all sorts of counter-attacking preparation that we’ve never seen, cooked up on Neville’s iPads, would kick in. That not being the case, at the very least you could argue, as Neville has, that Valencia’s performances coming from behind show they still have some fight. They’re third in La Liga for points after trailing, albeit with no wins, but last night’s awful result undoes this entire narrative, barring unimaginable heroics in the second leg.

To me, it looks increasingly like his 6am Spanish lessons are only going to be useful in saying his goodbyes this Summer. Whether this proves to be a learning experience for him as a manager, or a big enough blow to his ego to send him back semi-permanently to punditry remains to be seen.

Gary Neville’s Red Wedding