PATCHing Teams

I explained the current PATCH methodology in my previous post. Today I’m going to do a deep dive into how PATCH views the current teams in the EPL. Here’s what the table looks like (pre-Southampton on Saturday):

Team PATCH
Chelsea 2.84
Manchester United 2.79
Liverpool 2.73
Tottenham Hotspur 2.70
Manchester City 2.68
Bournemouth 2.64
Arsenal 2.61
Southampton 2.45
Leicester City 2.38
Aston Villa 2.34
Norwich City 2.28
Watford 2.27
Palace 2.27
West Ham United 2.27
Everton 2.22
Swansea City 2.18
Stoke City 2.13
West Bromwich Albion 2.11
Newcastle United 1.97
Sunderland 1.88

The values for PATCH are the 60th percentile of all performances for each team. You could, if you were highly motivated, work out the actual units for PATCH, but treat it as abstract. 2 is around average, somewhere just under 4 is the 90th percentile amongst player performances and 5+ would be outstanding. Given that, I am reasonably happy with how this shapes up.

PATCH Correlations

At first glance the numbers above don’t look bonkers, but how does the metric correlate with other team defensive stats? Let’s have a look:

Percentile GA xGA Shots Against SoT Against
10th 0.06 0.10 0.08 0.06
20th 0.22 0.30 0.20 0.15
30th 0.28 0.44 0.38 0.30
40th 0.29 0.56 0.54 0.44
50th 0.27 0.61 0.63 0.45
60th 0.27 0.71 0.76 0.55
70th 0.19 0.65 0.71 0.48
80th 0.16 0.60 0.62 0.38
90th 0.10 0.47 0.50 0.29

Those are the R2 values each team’s PATCH values at a certain percentile (10th being the lowest 10%, i.e. worst defensive performances), compared to some traditional measures. It’s great to see that we’re nicely correlated with expected goals against and shots, though I should point out that shots do directly go into the calculation – if you allow a shot through your territory, it’s marked against you. However, that’s only a small proportion of the gains measured. I tested with shots removed from the ball progression metric just to be sure and the correlations barely went down.

Defensive Ranks

So far we’ve only looked at team’s performances en masse, as measured by PATCH. This is what things look like if we break them down by rank in a team’s formation:

defensive-ranks

There are a few interesting patterns that immediately jump out:

  • Bournemouth’s attacking midfielders and forwards are doing a bunch of defensive work.
  • Manchester City’s less so.
  • Tottenham have the least penetrable midfield of any team in the league.
  • As you might expect, Leicester’s attack and defence are a little more robust than their midfield, reflecting the fact that they press high then retreat low.

Lingering on these numbers a little longer, I thought I’d compare these numbers to someone else’s model for a further sanity check. Mark Thompson of Every Team Needs a Ron is one of my favourite writers, and is devoted to studying defenders in all their forms. He has a system to analyse how teams convert possessions into attacks, and attacks into shots, and how they allow their opponents to do the same. I compared the defensive rank data above with his data to see what the correlations were:

Defence Midfield Attack
Attacks per Possession 0.65 0.47 0.40
Shots per Attack 0.46 0.53 0.38

So, comparing the Attack, Midfield, Defence PATCH values from the graph above to Mark’s Attacks per Possession and Shots per Attack, we can get an idea of how much different parts of a team contribute to breaking up attacks. Defensive PATCH values explain 65% of the variance in opponent attacks per possession, whereas midfield is a much lower 47%. This makes some sense, while a lot of teams would love their midfield to quash potential attacks before they happen, it’s far more common that they make it throught to the last line. What’s interesting is the second row, where midfield performances explain shots pre attack better than defence. Again I wonder if this is bad shot quality – the defence don’t (and often don’t want to) stop low-expectation long shots. However if your midfield are putting in a good screening performance, attackers won’t even get the space for bad shots.

That’s one explanation, anyway. At the very least I’m happy to see a decent correlation with someone else’s model.

Patchwork Defences

Defences are more than the sum of their parts. There are plenty of games where teams in aggregate can put in a great performance in terms of total or average PATCH values, but still be torn apart on the field. This happens often because of mistakes, which PATCH will probably never be able to account for, but it also happens because of weak links that let down the greater whole. Have a look at Manchester City from this weekend’s absolutely definitive title decider against Leicester:

defensive-areas-1901664770408076

This is a fairly green chart – City policed a lot of territory, and in various parts of the pitch prevented Leicester from making regular gains. But look at their right-hand side: Otamendi didn’t score especially highly, and Zabaleta (who seems to be pushing quite far forward) scored even worse. Teams rightly fear Leicester’s right wing, because that’s where the formidable Mahrez nominally takes the field, but here we saw Mahrez pop up on the left a few times, including for Leicester’s 2nd goal, and Drinkwater also made some penetrative passes. We can see this from Leicester’s attacking chart for the day:

LEI-MCI

Very left leaning, basically nothing on the right. Despite the fact that City conceded twice from set-pieces, you still saw scare after scare from open play. The combination of a weak defensive right-hand side, and players taking higher positions than was perhaps advisable against the league’s fastest counter-attacking team (still 2nd in Europe after Caen), meant that good PATCH scores in many parts of the pitch did not necessarily add up to a good defensive performance.

Weak Links

Given what we saw in the Man City vs Leicester game, perhaps we should judge a defence by its weakest link? After all, if they’re allowing lots of ball progression in their area, that’s obviously where the opposition are attacking, whether or not they’re thinking of that player as exploitable. If we just look at the lowest score for a defender in each game (using just those with 90+ minutes in a game to be safe), this is what teams come out looking like:

Team Mean Weak Link PATCH
Chelsea 2.32
Manchester United 2.06
Arsenal 1.99
Manchester City 1.97
Liverpool 1.78
Aston Villa 1.77
Leicester City 1.74
Tottenham Hotspur 1.74
Southampton 1.74
Bournemouth 1.68
Swansea City 1.62
Norwich City 1.61
Crystal Palace 1.59
West Bromwich Albion 1.58
Watford 1.54
Everton 1.53
West Ham United 1.48
Stoke City 1.48
Newcastle United 1.45
Sunderland 1.45

Nothing radically different here, perhaps I should be a little uncomfortable seeing Villa that high, but they have save percentage and shot creation issues, not necessarily an awful defence. That said, these numbers correlate less well with each of the four measures we compared to earlier, so it seems less representative.

Total Territory

PATCH fundamentally rewards defenders for claiming territory, so lets look into any team characteristics can we pick up from looking at their territory as a whole. Who uses the most space? Who leaves the most gaps?

This is total area per game of players’ defensive territory for each team, measured first as the sum of individual areas, then as a merged team area:

Team Total Individual Area Team Area
Arsenal 12375 5286
Aston Villa 12338 5590
Bournemouth 12487 5540
Chelsea 13385 5612
Crystal Palace 14240 5750
Everton 10494 5181
Leicester City 14580 6078
Liverpool 12817 5626
Manchester City 12025 5405
Manchester United 13156 5438
Newcastle United 11767 5003
Norwich City 11949 5614
Southampton 12885 5525
Stoke City 11464 5347
Sunderland 12880 5560
Swansea City 10882 5052
Tottenham Hotspur 13520 5522
Watford 12482 5779
West Bromwich Albion 12943 5694
West Ham United 14022 5693

Which looks like this:

territorial-area

The X axis is the total individual area, which includes overlaps between players. The Y axis is the team shape, the area you get when you merge all the individual territories together and forget overlaps – also worth noting that the lower this value, the more empty spaces the team is leaving on the pitch.

It’s interesting because it reveals teams that are quite expansive in their defensive efforts (to the right are basically the pressers and aggressors, to the left is… Everton, asking very little of its defence). It also shows teams that have an overall compact defensive shape (Newcastle) versus those that are push up more (Leicester, Watford). Above the trend line are teams with less overlap, below are those that are more crowded when defending.

If we apply a similar sort of calculation to PATCH, we can take a team’s area and judge them not by the progression they allow through their territory, but by the progression that happens outside it. If we do that, these are the numbers we see:

Team Outside Territory PATCH
Manchester City 24.09
Liverpool 23.55
Norwich City 22.78
Southampton 22.04
Leicester City 20.77
Watford 20.75
Tottenham Hotspur 20.59
Aston Villa 20.22
West Bromwich Albion 20.19
Crystal Palace 19.37
Bournemouth 19.35
West Ham United 19.26
Chelsea 18.34
Manchester United 17.94
Arsenal 16.98
Swansea City 16.51
Stoke City 16.36
Sunderland 15.96
Everton 13.31
Newcastle United 11.28

So Man City, Liverpool and… Norwich (apparently) allow the least progression outside their territory. Newcastle and Everton leave the biggest gaps for opponents to operate inside.

Getting Goal Side

Above you saw how a lot goes on in empty spaces. The thing that worries me most about PATCH, and particularly the approach I’ve taken to trimming events for territory, is space behind a defender. Perhaps we should leave in all goal side events for a defender? Even more, should we project their territory back to the goal line, or even the goalmouth itself?

Well, you’re going to have to wait to find out. In my next post I’m going to finally get around to looking at some individual player scores, and I’ll experiment with how defenders should be blamed for goal side events then.

PATCHing Teams

Christmas Shopping: Goalkeepers

The nights are getting longer up here in the Northern Hemisphere, and soon children will be donning their traditional transfer window jumpers and gathering around open fires to sing traditional transfer window songs. In preparation for the festive season, I’m going to think about teams with really obvious deficiencies, and work out what Santa’s elves might be able to fax over on deadline day to fix them.

We’re going to start with goalkeepers, because frankly it’s easiest to draw up a naughty list of of rubbish keepers using our expected saves model. Below is the list of all keepers that have on average underperformed in the last five seasons, i.e. they’ve made fewer saves than the expected saves model expected. The rating is simply saves over expected saves, times 100. 100 is a keeper that saved exactly what the model thought they should, over is good, under is bad.

An aside as an Everton fan: I am going to note here that the player just above this list, who only just scraped a rating of 100.1, is Tim Howard. I don’t believe he’s as bad as most Everton fans like to make out (he’s just above Joe Hart in this year’s ratings, basically in the middle of the pack), but those that want to play along can by all means picture my recommendations below as applying to Everton as well (or indeed whichever team you happen to support). Just note that whoever Everton might get in will be facing the second most shots of any keeper in the Premier League, and mistakes will be made.

Keeper Season
2010 2011 2012 2013 2014 2015 Avg
Simon Mignolet 99.3 101.5 107.7 96.6 98.2 93.3 99.4
Julian Speroni 103.1 95.1 99.1
Tom Heaton 98.7 98.7
Richard Kingson 98.7 98.7
Adam Federici 98.4 98.4
Ben Hamer 98.2 98.2
Ali Al-Habsi 102.2 100.2 92.1 98.2
Matthew Gilks 98.0 98.0
John Ruddy 100.4 100.1 98.3 92.9 97.9
Robert Elliot 92.3 97.6 102.9 97.6
Brad Friedel 94.4 101.7 96.4 97.5
Bradley Jones 97.4 97.4
Kasper Schmeichel 98.7 95.9 97.3
Costel Pantilimon 90.9 104.5 96.4 97.3
David Marshall 97.2 97.2
Paulo Gazzaniga 103.6 90.3 97.0
Tim Krul 87.2 101.2 99.7 101.0 95.4 95.3 96.6
Boaz Myhill 85.4 108.8 87.4 104.9 96.1 96.5
Thomas Sørensen 99.4 99.7 89.9 96.3
Steve Harper 97.4 87.0 96.4 104.5 96.3
Adam Bogdan 93.3 99.3 96.3
Mark Bunn 96.3 96.3
Marcus Hahnemann 96.2 96.2
Robert Green 97.2 91.6 99.9 96.2
Gerhard Tremmel 102.5 88.2 95.3
Wayne Hennessey 95.6 100.4 89.9 95.3
Anders Lindegaard 107.0 83.3 95.1
Brad Guzan 98.1 97.8 92.6 96.7 89.9 95.0
Joel Robles 92.5 96.0 94.3
Kelvin Davis 90.5 96.9 93.7
Paul Robinson 101.3 85.9 93.6
Artur Boruc 95.3 100.4 85.0 93.5
Scott Carson 93.1 93.1
Patrick Kenny 91.4 91.4
Allan McGregor 93.7 87.4 90.5
Maarten Stekelenburg 93.9 83.1 88.5
Dorus de Vries 85.8 85.8
Stuart Taylor 81.2 81.2

There are a few main things I want to note here:

  1. Southampton have terrible taste in keepers – Boruc, Davis, Stekelenburg, all generally underperforming expected saves. Fraser Forster may come good, but until then, Southampton’s overall organisation is covering up a lack of quality between the posts.
  2. Bournemouth are in real trouble – Boruc isn’t great (not shown here is his 3 mistakes leading to goals already this year), and Adam Federici hasn’t done much better, but he’s left off this table as he’s below the 10-save cutoff. On top of these fairly poor performances is the fact that the shots Bournemouth are allowing are far, far trickier than any other team in the league (0.42xg against Boruc, 0.48 against Federici, against a league average of about 0.3), so literally anyone in their goalmouth would struggle.
  3. Brad Guzan is the only keeper consistently, year after year, to underperform expected goals but keep his place. The 100-based ratings actually boost him up the table a bit – in terms of raw goals above/below expected, Guzan is last this year, last in 2013, and firmly bottom 6 every season he plays. That’s partly Aston Villa’s woeful defence, but I do not know how Guzan has kept his place for so long.

Of this year’s relegation candidates, Robert Elliot, standing in for Tim Krul at Newcastle, is the only keeper to be performing above expected saves, by a teeny 0.3 goal margin. Pantilimon at Sunderland is poor but not the worst, Bournemouth would probably benefit more from a defensive shakeup to reduce the quality of chances conceded, and I think that leaves Aston Villa as the prime candidates for an upgrade. I might argue in a future post that their defence needs patching (*cough* Alan Hutton *cough*), but they’re conceding chances with an average 0.25xg which isn’t terrible. Guzan, however, is four goals down on where he should be this season and if history’s anything to go by, he’s going to get continue leaking goals. This is the last five seasons in detail:

Season Mins Shots Saves Goals Save % Expected Saves +/- Expected Shot Difficulty Rating
2015/16 1134 58 39 19 67% 43.4 -4.4 25.2 89.9
2014/15 3201 148 101 47 68% 104.5 -3.5 29.4 96.7
2013/14 3570 167 110 57 66% 118.8 -8.8 28.9 92.6
2012/13 3385 174 114 60 66% 116.6 -2.6 33.0 97.8
2011/12 620 26 18 8 69% 18.3 -0.3 29.4 98.1

So it kinda goes without saying, looking at the historical data above, that Villa could have sorted this out over the Summer, or last year, or the year before. But we’re entering a hypothetical world here where teams might agree to sell their first-choice goalkeeper in the January window, and those keepers might agree to join a team at or near the bottom of the Premier League, plus or minus any sort of reaction that Remi Garde gets between now and then. Let’s assume that nobody is going to drop down from a team above Villa to help out, otherwise I’d probably just point at Jack Butland and be done with it. Villa have been bringing in youth over the Summer, so let’s look at keepers 25 and under in Europe, playing at teams not currently in European competition, with decent ratings from our model. Let’s just assume that Premier League TV money is enough to land one of these targets. Who’s out there?

Keeper Mins Shots Saves Goals Save % Expected Saves +/- Expected Shot Difficulty Rating
Timo Horn 4155 231 177 54 76.6% 165.2 11.8 28.1 107.2
Gerónimo Rulli 2922 133 93 40 69.9% 87.9 5.1 33.1 105.8
Julián 1491 101 75 26 74.3% 71.1 3.9 20.2 105.5
Loris Karius 6208 349 257 92 73.6% 244.8 12.2 29.6 105.0
Benjamin Lecomte 4968 246 178 68 72.4% 171.0 7.0 29.1 104.1
Alphonse Areola 4386 183 131 52 71.6% 126.3 4.7 33.2 103.7
Marco Sportiello 4881 269 197 72 73.2% 191.1 5.9 30.2 103.1
Mattia Perin 9428 548 388 160 70.8% 380.0 8.0 30.6 102.1
Nicola Leali 3587 195 135 60 69.2% 132.9 2.1 31.3 101.6
Oliver Baumann 10447 573 406 167 70.9% 404.9 1.1 29.2 100.3

I’ve snuck Alphonse Areola in here despite the fact that he’s on a season long loan, just because he is/was vaguely available in principle. Any of these players, dead or alive, would probably be an improvement, and it seems like the transfer rumour mill, and potentially even Villa’s scouts, are ahead of me, they’ve been linked with Mainz’s Karius, and indeed Timo Horn. I don’t have Championship data, or smaller foreign leagues, so I will rely on those of you with eyes to fill me in there.

It’s worth noting that perhaps these numbers miss important parts of a modern goalkeeper’s game: Paul Lambert certainly rated Guzan’s distribution, we ought to look into that. Here’s everybody’s overall passing numbers:

Keeper Passes Completed Ratio
Oliver Baumann 4898 3096 0.63
Timo Horn 1537 953 0.62
Loris Karius 2633 1560 0.59
Gerónimo Rulli 973 574 0.59
Marco Sportiello 1616 931 0.58
Alphonse Areola 1383 798 0.58
Nicola Leali 1122 651 0.58
Mattia Perin 3167 1839 0.58
Benjamin Lecomte 1690 939 0.56
Brad Guzan 4455 2450 0.55
Julián 473 234 0.49

And here’s everything over 40 yards:

Keeper Passes Completed Ratio
Gerónimo Rulli 666 295 0.44
Nicola Leali 779 337 0.43
Brad Guzan 3181 1319 0.41
Marco Sportiello 1034 417 0.40
Julián 370 149 0.40
Oliver Baumann 2592 940 0.36
Benjamin Lecomte 1064 378 0.36
Timo Horn 857 305 0.36
Loris Karius 1527 548 0.36
Alphonse Areola 836 290 0.35
Mattia Perin 1845 641 0.35

So Guzan has 5% over Timo Horn on long balls, take it or leave it.

It remains to be seen whether Aston Villa’s transfer window tree will be sheltering a Timo Horn-shaped present this holiday season – I nearly ran the numbers on January goalkeeper transfers to see if it happened that regularly – but I’ll leave that for the more enterprising of you. It’s possible these targets have been approached and Villa have neither the ambition nor the spending power to land any of them. All you can ask for in your letters to Lapland this year is that Remi Garde gets Villa’s Summer signings to gel into some sort of attacking unit, Jack Graelish stops being peak-Ross Barkley wasteful, and someone keeps putting their face in the way of the ball.

Christmas Shopping: Goalkeepers

Visualising Centre of Gravity

Gab Marcotti, talking on the excellent Analytics FC podcast a couple of weeks ago, mentioned a stat that’s sometimes talked about in Europe, but less so over here in the UK – a team’s centre of gravity. This is a single point on the field that shows the team’s average position. It might not seem hugely revealing taken in isolation, but comparing teams, matches, or seasons, you can see some movement and draw some conclusions.

I thought about ways to visualise this, and thought just a couple of dots on an empty pitch would be a little underwhelming, but then I realised we could go a bit further. I want to try to create the simplest possible visualisation that can give you the general gist of a team’s positioning and passing, and so I’ve calculated the following:

  1. A team’s centre of gravity in each match. This is just the average of all that team’s touches in the match.
  2. The average starting position of that team’s passing, which will hopefully show roughly where the team is using the ball and starting moves. I’ll call this pass origin.
  3. The average ending position of the team’s passes, which might reveal the length of their passes, or a preference for one wing over the other. Although note that sideways passes can cancel each-other out, so you’ll see most lines pointing through the middle – later we might want a way to show a team’s preference for forwards versus sideways passing, but we don’t have it here. Anyway, I’ll call this ending position of passes pass destination.

I’ve calculated these by using all touches, but only successful passes, and I’ve removed goal kicks and drop kicks because I feared they would somewhat skew the numbers. Let’s tell the tale of two matches and see if we can make sense of them – here’s Arsenal’s 5-2 win over Leicester from September:

ars-lei-cog

I’ll jazz these up at some point, but for now, here’s what you’re seeing: the graph represents the middle third of the middle third of the pitch, so a rectangular block in the middle, the centre spot being where the dotted axes cross. The start of the dotted line is the centre of gravity, the average of a team’s touches. The end of the dotted line, and the start of the solid line, is the pass origin – the average position in which passes were attempted. And the end of that line is the pass destination, the average position of a pass completion. The arrows show the direction of play. So what conclusions would we draw at a glance here?

  • Leicester seemed to be pushing up near the half way line. Some of this might be score effects – they only led for 5 minutes, and trailed from 33 mins onwards. But some might just be a decision to play a high line against a good attack, a decision we’ve seen blow up before.
  • Leicester’s passing was longer than Arsenal’s. Perhaps you’ll remember the long ball that led to Vardy’s opener.
  • Arsenal leaned a little to the left of the pitch, which was, incidentally, hat-trick scorer Alexis Sanchez’s side that day.

Whether or not that tells much of a story of the match on the day is up for debate, obviously I can interpret the lines to fit the facts but that proves little. But let’s compare to another Arsenal game, this time the 3-0 win over Manchester United:

ars-mun-cog

Well this is a slightly different Arsenal – much deeper, with Man Utd camped out in the Arsenal half, chasing a 3-0 deficit after 20 minutes. You’d be forgiven for perhaps interpreting the deeper Arsenal play and longer balls as a superb counter-attacking performance, but I think that’s misleading – they dominated totally early on, and then shelled effectively. That identifies straight away a risk with these visualisations, and indicates I should probably be at least splitting them into halves, or event better dividing them up for each goal. Perhaps it’s possible to fit that on one graph, who knows.

Wanna see a real counter-attacking performance? Here you go:

ars-fcb-cog

Deep again, even longer balls forwards, and Bayern compressed here as you might expected when you average their 600-odd passes.

So, I think this is an interesting way of glancing at games. It doesn’t reveal the whole story, but I think with a bit more granularity, it’s an interesting kicking off point for a game’s narrative. When I’ve got these automated I’ll start putting them out for each game, and there are a few comparisons I’d like to make in the future, e.g. Barcelona’s evolution over the last few years. If anyone has any specific requests or wants some limited data to play with their own visualisations, just get in touch.

And finally, another shout out to the Analytics FC podcast which inspired the work here!

Visualising Centre of Gravity