Can AI predict the Premier League’s top scorer?
How often have we looked at a game and wondered ‘how did he miss that?’. Well, with expected goals (xG) we can now really see if our frustration is justified, and perhaps use that to predict future results.
By Haidar Altaie, Data Scientist at SAS
The use of analytics in sport is not a new concept. The average NBA fan is likely to be more equipped than ever with statistics to understand a game. Leader board from not just points and assists, but also blocks, steals, percentage field goals, free throw differentials and so much more, widely available and most importantly, used and discussed in mainstream media.
Whereas your average football fan would say ‘my eyes tell me everything I need to know’ due to the long-time misuse of statistics by fans.
What is Expected Goals?
Opta, the football data provider, describes xG as ‘the quality of a shot based on several variables such as assist type, shot angle and distance from goal, whether it was a headed shot and whether it was defined as a big chance’.
In other words, it uses historical averages to determine how likely it is that a chance should be scored. Adding together all these numbers can give us an indicator as to how many chances should have been scored, on average.
xG allows us to look at the underlying performance compared to actual results with the naked eye. It helps us comprehend the game, and if used correctly, to predict future outcomes more accurately than any other measure.
When Mo Salah scored a record breaking 32 goals to become Premier League top scorer of the 2017/18 season, expectations were high for him to hit the ground running. However, when he only scored 3 goals in 8 games by mid-October, many described him as a ‘one-season wonder’ and unable to replicate last season’s form. Using SAS for visualisation in SAS Visual Analytics, we were able to look at the xG metric.
Mo Salah was in fact leading the league with expected goals, yet he wasn’t anywhere near the top scorers, where eight players had scored more than him, suggesting he was merely underperforming (or, unlucky) rather than in decline.
Forward four months and 17 league games, Mo Salah now sits joint top of the Premier League scorers chart. He’s managed to become more aligned to his expected goals. Underlying performances could suggest more than just the output, over a long run.
Just as we get 50:50 results between heads/tails when tossing a coin over a large sample size, xG tends to level out with actual goals scored.
Another example of how expected goals can ‘predict the future’ is the case of the iconic Leicester striker Jamie Vardy, who finished the 2017/2018 Premier League season with an impressive 20 goals. But what did his numbers suggest? Jamie Vardy outperformed his xG by nearly five goals.
xG suggests that a trend like this isn’t sustainable over a long period, despite consistently keeping this up through a 38-game season. If we have a look at this season’s numbers so far, we see the reverse is true. So the effect of combining this with last season (and extending the sample size), is that we revert more to the norm of xG equals actual goals.
Expected Goals has the potential to completely revolutionise analytics in football. It can open a whole new world where data can now tell a story, beyond just ‘reporting’ on the game, and it will only continue to get better as more and more data about shots can create more accurate models. However, it does come with limitations.
Scoring goals is a skill, and while one could rightfully argue that getting into scoring positions is more important than the finish, it’s still possible for one player to be better than others in putting the ball in the back of the net. Something expected goals doesn’t consider and needs to be put in context.
Football is an open game with a high number of possibilities (which could perhaps be a reason why it’s behind on ‘useful’ analytics compared to other team sports). It means only accommodating several variables (shot type, location, angle) does not tell the full story, there are countless variables we fail to consider when calculating expected goals, and perhaps we will never be able to fully accurately calculate those.
And finally, the human factor. A player that has ‘underperformed’ could have their confidence suffer due to the lack of goals, which will genuinely dry up their output over the long run. Similarly, an overperforming run could boost someone’s confidence so they continue to improve. xG gives a great starting point, but as with all other statistics, it requires context. For example, if you look in more detail at the chances taken and chances missed by an individual player, you may be able to spot reasons why they score in certain situations and tend not to in others.
Despite limitations, football could catch up with the rest of the sporting world in terms of analytics with xG, it will allow us to scout up-and-coming talent, and see if they’re over performing or the real deal. Tactics can be altered to focus on chance creation over the long run as opposed to focusing purely on the next result.
Finally, it will allow us to open a new chapter in football analytics, where the use of AI can play a valuable role, as it’s already doing with SciSports. The Dutch sports analytics company gives a ‘quality’ rating to each individual player using model based mathematical algorithms, based on the contribution of a player to the teams result, all in real-time.