Bringing analytics to the football transfer window

6th August 2019
Posted By : Anna Flockett
Bringing analytics to the football transfer window

The football season came to an end after an intensive year, and we saw Manchester City claim the Premier League title, after a high-intensity title race against Liverpool. It is unbelievable to think that Liverpool would have won 25 out of the last 27 PL titles with 97 points. Luckily for them, the Reds won the Champions League, and redeemed themselves after losing last year's finale.

Football is over on the pitch for the season, but the battle for the transfer season has just begun. Money shapes major football leagues all over the world despite rare successes by lower-budget teams such as Leicester City in 2016. Teams across Europe are changing the outcome of their domestic leagues with massive transfer budgets. Transfer season is the time when teams are shaping their potential for the next season for sure. Sports analytics is often used to analyse teams on the pitch, but it is possible to bring it to the transfer season also. So, now we have a chance to analyse the upcoming transfer season using mathematical optimisation and the capabilities of SAS Viy, and that is exactly what Sertalp Cay, Operations Research Specialist at SAS, does here.

Analytics for transfer season
In football, players can move between clubs during the transfer season. If they are out of contract, clubs can acquire them and sign a contract. Otherwise, the current contract needs to be terminated before any transfer. In this case, the purchasing team pays an amount called the transfer fee.

"How should we allocate our transfer budget to maximise the benefit we gain?" This is the ultimate question that every team needs to answer. (Teams often try to answer” Which player should we get to make our fans happy?", but no one truly knows what could make fans happy.) Maximising benefit under a limited resource is known as the Knapsack problem in combinatorial optimisation. Given a set of items and their values, the Knapsack problem is to find the optimal selection of items to pack within a weight limit to maximise the total value. We can ask a similar question here: given a set of players, their values and ratings, how to choose which players to transfer to maximise total team rating within a budget limit.

Even though writing a detailed mathematical model of the problem is challenging, I will show how a simple model can be written to benefit from the capabilities of optimisation. Before we dive any further, note that we are solving a simplified problem under the following assumptions to make things easier:

  • We consider only the starting lineup to measure team ratings
  • Teams can transfer any player as long as their current value is paid
  • We only focus on acquiring players, not selling them
  • Teams use the same formation for the next year
  • Players can be played only at the positions they are listed in the data set

Data
One of the most challenging stages of any analytical problem is to obtain clean data. At this point, we are lucky to have a great web resource: sofifa.com. SoFIFA has more data than we need for this problem. By using parallel web requests, we managed to create a database of 12,000 players sorted by their overall rating. The web scraper is available on GitHub and the data are available as a CSV file.

As an important side note, since these models are being run on data based on the football game FIFA, not on real player metrics, they are a better reflection of the players in the computer game, not the players in real life. However, these same concepts can be applied to real player data if you have access to it.

Model
Our aim is to maximise the sum of player ratings in the starting lineup of teams. We will solve the problem separately for each team. For each position, we filter the list of players who have a better rating than what the team currently has. Then, the increase in the total rating is used to measure the performance of the transfer for the team.

Let us define PP as the set of all players, SS as the set of team positions, and EE as the set of player-position pairs. The following parameters are used to define the problem:

  • R¯jR¯j: Current rating of the player at position jj
  • RiRi: Overall rating of player ii
  • BB: Team budget
  • ViVi: Transfer value of player ii

The main decision variable tijtij represents a binary variable, whether player ii is transferred for position jj. We also have an auxiliary variable rjrj to define the final rating for position jj in the formation.

The objective function can be written as the summation of the final ratings:

maximise∑j∈Srjmaximise∑j∈Srj

Our first constraint is the budget for the transfer. The total value of players transferred cannot exceed the team budget:

∑(i,j)∈EVi⋅tij≤B∑(i,j)∈EVi⋅tij≤B

The next constraint defines the final rating for each position. This constraint accounts for transfer player ii replacing the current player at position jj:

rj=R¯j+∑i∈P:(i,j)∈E(Ri−R¯j)⋅tij∀j∈Srj=R¯j+∑i∈P:(i,j)∈E(Ri−R¯j)⋅tij∀j∈S

The following two constraints satisfy conditions that at most one player is transferred for a given position, and the one player cannot be transferred for two different positions: ∑j∈S:(i,j)∈Etij≤1∀i∈P∑j∈S:(i,j)∈Etij≤1∀i∈P

∑i∈P:(i,j)∈Etij≤1∀j∈S∑i∈P:(i,j)∈Etij≤1∀j∈S

Python Model

We model this problem using sasoptpy, an open-source Python interface of SAS Optimisation.

m = so.Model(name='optimal_team', session=session)

 

rating = m.add_variables(POSITIONS, name='rating')

transfer = m.add_variables(ELIG, name='transfer', vartype=so.BIN)

 

  1. set_objective(

  so.quick_sum(rating[j] for j in POSITIONS), name='total_rating', sense=so.MAX)

 

  1. add_constraint(

  so.quick_sum(transfer[i, j] * value[i] for (i, j) in ELIG) <= budget, name='budget_con')

 

  1. add_constraints((

  rating[j] == overall[member[j]] + so.quick_sum(

      transfer[i, j] * (overall[i] - overall[member[j]]) for (i, j2) in ELIG if j==j2) for j in POSITIONS), name='transfer_con')

 

  1. add_constraints((

  so.quick_sum(transfer[i, j] for (i2, j) in ELIG if i==i2) <= 1 for i in PLAYERS), name='only_one_position')

 

  1. add_constraints((

  so.quick_sum(transfer[i, j] for (i, j2) in ELIG if j==j2) <= 1 for j in POSITIONS), name='only_one_transfer')

 

  1. solve()

Notice that it is very easy to model this problem using the Python interface. Our open-source optimisation modelling package sasoptpy uses the runOptmodel action under the hood, as shown in examples in the documentation. If you are familiar with PROC OPTMODEL, you can write the SAS code and run it on SAS Viya directly.

Results
We have run the optimal transfer problem for the top six teams in Premier League standings: Manchester City, Liverpool, Chelsea, Tottenham, Arsenal, Manchester United. The current team and budget information are obtained from SoFIFA at the time of execution. We filtered out all the players older than 33 years old since a majority of players reach their peak before 33 and steadily lose performance.

See the table below for a comparison between optimal transfers for each team. The positions of the transfers are given in the following figures below the table.

Team

Old Rating

Avg

New Rating

Avg

Budget

Money Spent

Efficiency

Transfers

Manchester City

944

 

972

 

€170.0M

€170.0M

 

Giorgio Chiellini, Thiago Emiliano da Silva, Jordi Alba Ramos, C. Ronaldo dos Santos Aveiro

Liverpool

932

 

949

 

€90.0M

€89.5M

 

Łukasz Piszczek, Giorgio Chiellini, Sergio Busquets Burgos

Chelsea

925

 

948

 

€95.0M

€94.0M

 

Samir Handanovič, Giorgio Chiellini, Thiago Emiliano da Silva, Marco Parolo

Tottenham Hotspur

933

 

949

 

€85.0M

€82.0M

 

Filipe Luís Kasmirski, Marco Parolo, Sergio Busquets Burgos

Arsenal

905

 

933

 

€92.5M

€90.0M

 

Lars Bender, Giorgio Chiellini, Filipe Luís Kasmirski, Fernando Luiz Rosa

Manchester United

915

 

951

 

€175.0M

€174.0M

 

César Azpilicueta Tanco, Giorgio Chiellini, Thiago Emiliano da Silva, Filipe Luís Kasmirski, Luka Modrić

As mentioned above, we do not consider the likelihood of the transfer itself. We consider what money could buy if teams are able to get players at their current valuation.

Manchester City increases its total team rating from 944 to 972 by 28 points if they spend all of their current transfer budget of €170M. It is not surprising to see that with a rather limited budget of €90M, Liverpool can increase its total rating by 17 points, whereas Manchester United's total team rating can increase 36 points with their massive budget of €175M.

The efficiency column is calculated by dividing the change in total rating by total money spent in million euros. We expect the efficiency of the transfer to be larger when a few players have significantly lower ratings compared to the rest of the team and can be replaced with rather cheap alternatives. Arsenal has the highest efficiency and can increase its total rating 0.31 per millon euros by purchasing 4 players.

The reason why the total rating of Liverpool does not increase as much as Arsenal's despite having close transfer budgets can be explained by the variation of the player ratings. The rating of the right back (RB) is increased 9 points (from 73 to 82) with a transfer worth of €17M for Arsenal. Liverpool's lowest rating in the current team is 80. Player values tend to increase sharply as we increase the rating:

Therefore, it is clear why some teams have an advantage in the transfer season. For these teams, it is easy to improve the team by replacing the weakest player. Consider these two extremes: Manchester City has to spend €170M to improve its total rating by 28 points, whereas Arsenal increases its total rating the same amount by spending €90M only.

Here's how the old and new lineups look for each team. New transfers are coloured red while existing players are in blue: 

Budget limitations
In the last problem, we will have a look at how the budget is affecting the decisions. We will be varying the transfer budget of Liverpool from €0 to €200M in increments of €10M to see how it affects the outcome.

New Rating

 

Budget

Money Spent

Efficiency

Transfers

932

 

€0M

€0M

0

933

 

€10M

€7M

 

Łukasz Piszczek

936

 

€20M

€20M

 

Łukasz Piszczek, João Miranda de Souza Filho

939

 

€30M

€24M

 

Thiago Emiliano da Silva

942

 

€40M

€38M

 

Łukasz Piszczek, Giorgio Chiellini

943

 

€50M

€48M

 

Lars Bender, Giorgio Chiellini

945

 

€60M

€56M

 

Kyle Walker, Giorgio Chiellini

946

 

€70M

€62M

 

César Azpilicueta Tanco, Giorgio Chiellini

947

 

€80M

€76M

 

Joshua Kimmich, Giorgio Chiellini

949

 

€90M

€90M

 

Łukasz Piszczek, Giorgio Chiellini, Sergio Busquets Burgos

950

 

€100M

€98M

 

Giorgio Chiellini, Luka Modrić

952

 

€110M

€107M

 

Kyle Walker, Giorgio Chiellini, Sergio Busquets Burgos

953

 

€120M

€116M

 

Kyle Walker, Giorgio Chiellini, David Josué Jiménez Silva

955

 

€130M

€129M

 

César Azpilicueta Tanco, Giorgio Chiellini, Luka Modrić

955

 

€140M

€129M

 

César Azpilicueta Tanco, Giorgio Chiellini, Luka Modrić

957

 

€150M

€149M

 

César Azpilicueta Tanco, Giorgio Chiellini, Fernando Luiz Rosa, Luka Modrić

958

 

€160M

€160M

 

César Azpilicueta Tanco, Giorgio Chiellini, Jordi Alba Ramos, David Josué Jiménez Silva

959

 

€170M

€167M

 

César Azpilicueta Tanco, Giorgio Chiellini, Jordi Alba Ramos, Luka Modrić

961

 

€180M

€180M

 

César Azpilicueta Tanco, Giorgio Chiellini, Luka Modrić, Sergio Busquets Burgos

962

 

€190M

€189M

 

César Azpilicueta Tanco, Giorgio Chiellini, Luka Modrić, David Josué Jiménez Silva

962

 

€200M

€189M

 

César Azpilicueta Tanco, Giorgio Chiellini, Luka Modrić, David Josué Jiménez Silva

 

As seen below in detail, efficiency (total rating increase per million euros) decreases as we pay more money for a relatively lower change, as expected.

It seems Liverpool gets the best worth of its money if the Reds transfer Thiago Emiliano da Silva for CB position. Notice that efficiency converges to 0.16 total rating increase per million euros spent as we keep increasing the budget.

Increasing the potential
We have looked only at the current ratings of the players up to this point. The next problem we solve includes "potential" ratings of the new transfers. Naturally, young players have a significantly higher potential value compared to the old players. We need to replace the rating constraint as follows:

rj=P¯j+∑i∈P:(i,j)∈E(Pi−P¯j)⋅tij∀j∈Srj=P¯j+∑i∈P:(i,j)∈E(Pi−P¯j)⋅tij∀j∈S

where PiPi is the potential rating of a player, and P¯jP¯j is the potential of the current player at position jj in the team.

For players under 25 years old, the optimal solution is to replace Henderson and Matip with Melo and de Ligt for €36M and €44M, respectively. These changes increase the potential rating by 18 points:

Edit: An earlier version of the blog post compared potential ratings of new transfers to current ratings of the current team. After fixing the problem, results have changed slightly.

Edit #2: We have updated results after fixing a filtering issue with the CSV database.

Dream Team under 23
Based on reader suggestions, we had a look at the optimal squad under €150M budget. Our objective is to maximise the potential rating and create a full team. I chose 4-4-2 formation for illustration purposes. The optimal squad cost €148.3M and the potential rating is 982:

Pos

Player

Rating

Potential

Paid

GK

Gianluigi Donnarumma

83

94

 

LB

Thilo Kehrer

79

87

 

LCB

William Saliba

71

88

 

RCB

Boubacar Kamara

75

88

 

RB

Trent Alexander-Arnold

80

89

 

LCM

Rodrigo Bentancur

78

90

 

CM

Ricard Puig Martí

69

89

 

RCM

Sandro Tonali

73

90

 

CAM

Phil Foden

75

90

 

LS

Christian Kouamé

75

89

 

RS

Ezequiel Barco

73

88

 

Total

831

982

 

 

This concludes this brief analysis of potential transfers for top Premier League teams using Python and SAS Viya. As usual, all the code for the problem is available at GitHub.


You must be logged in to comment

Write a comment

No comments




Sign up to view our publications

Sign up

Sign up to view our downloads

Sign up

Girls in Tech | Catalyst | 2019
4th September 2019
United Kingdom The Brewery, London
DSEI 2019
10th September 2019
United Kingdom EXCEL, London
EMO Hannover 2019
16th September 2019
Germany Hannover
Women in Tech Festival 2019
17th September 2019
United Kingdom The Brewery, London
European Microwave Week 2019
29th September 2019
France Porte De Versailles Paris