Set Level Prediction in Tennis

#1
Hi all, hope someone can point me in the right direction to help me with the problem below!

The Data

Lets assume that the dataset I have is solely Set-Level win/loss data for every single professional tennis match in the past X years.

PlayerA, PlayerB, Set#, playerA_win

Nadal, Federer, 1, 1
Nadal, Federer, 2, 1
Nadal, Federer, 3, 0

Apologies, wasn't sure how to stick in a proper table!

The Problem

I am looking to come up with a predictive factor based on 2 player's historical performances, however I am not keen on using something like a basic 'win%' because, as we all know, we can't just assume that the quality of players each player has played are the same and therefore win%s aren't particularly predictive when it comes to being used in ML.

Another strategy I have previously tried is the 'common opponents' method, which looks at each player's (PlayerA & PlayerB) performance vs players that the other player has played. These results are then compared and used to calculate the difference in quality of both A & B. For example, if PlayerA beats PlayerC 60% of the time and PlayerB beats PlayerC 75% of the time then we can calculate AvsB with the assumption that performance is transitive between players. However, I have found that this does not take into account performances against players that the other player hasn't played, for example if PlayerA plays PlayerC/D/E and gets battered by D & E, but PlayerB has only played PlayerC then we are left with large gaps in the predictive quality of our final number.

Appreciate this is long winded but have been racking my brains for days and in desperate need of extra thoughts!

Thanks
 

Dason

Ambassador to the humans
#2
A fairly simplistic but surprisingly good place to start might be a Bradley Terry model. https://en.m.wikipedia.org/wiki/Bradley–Terry_model
I helped somebody with predicting soccer games and it worked better than expected We did incorporate external predictors to modify the probabilities to help with the model but this might give a good starting place for research.
 
#3
A fairly simplistic but surprisingly good place to start might be a Bradley Terry model. https://en.m.wikipedia.org/wiki/Bradley–Terry_model
I helped somebody with predicting soccer games and it worked better than expected We did incorporate external predictors to modify the probabilities to help with the model but this might give a good starting place for research.
Hmm okay interesting this looks like we might be on the right path. What do you mean by ‘using external predictors’?
 

Dason

Ambassador to the humans
#4
For the soccer model we used things like overall team salary, last ten games number of wins, certain Vegas odds, etc, ...as external variables to help with the predictions.