Statistics question

#1
Hi there,

I have a stats question regarding a project I'm working on that I do not have the answer to and I would be grateful if anyone can help me.

I have a large set of historical data detailing the result of a series of two person contests with a number of individuals (i.e. more than 2) involved. For example, there may be a group of people all of whom compete head to head with one another in a series of contests. Each contests is between two people (from the group) and there is a winner and a loser for that contest.

Furthermore each of these individuals has a 'rating' (indexed to 1,000) which is designed as an indicator of ability, i.e. the larger the rating the more likely an individual is to beat an individual with a lower rating. These ratings subsequently change dependent on the result of the contest.

I am looking to use these 'pre-contest ratings' to generate a probability of one player beating another. E.g. If player A has a rating of 950 and player B has a rating 1200, I am looking to calculate the probability of player A being victorious.

I was thinking that it makes sense to plot a scattergraph of winrating and loserating, but how to generate a probability from this i'm unsure.

Any help would be most gratefully received, I was hoping it would be something I could do either in Excel or SPSS as these are programs I am familiar with.
 

JohnM

TS Contributor
#2
What you would need to do is look through the historical data for the paired contests, look at the rating of each winner and the rating of each loser, and generate probabilities for each.

For instance, you may find that players with ratings between 900-1000 win against players with ratings between 1200 and 1300 approx 35% of the time, etc.
 

JohnM

TS Contributor
#4
Yeah, it's not very complicated, just a bit tedious when you consider that you have to account for every pairing....
 
#5
you might make a histogram of the % of wins by the higher rated player for a given difference in rating. for example your bin width might be 100 with one bin centered at 1000 and see what that looks like and then adjust your bin width.

are you working with anykind of handicapping system, or trying to find a way to do that? if you are i can send you in a good direction once you have answered your question above.

cheers
jerry
 
#6
Sorry I'm not sure I understand, how do you mean handicapping system? It sounds like it may be useful but I'm not sure how it applies?

The output I am ultimately looking for is a model which can generate the probability of Player A with rating X beating Player B with rating Y

thanks,

craig
 

JohnM

TS Contributor
#7
I guess what Jerry means by handicapping is a way to "level the playing field" between players of differing ability - if player A's probability of beating player B is less than 50%, then player A's scores would be adjusted upward to compensate.

This method is used often in amateur / club tournaments in bowling, golf, and archery so that the top player(s) aren't always taking home the trophies...

....but it doesn't sound like you need that - you just want to know probabilities, right?
 
#8
yeah exactly, the ultimate output i am after is the probability. the way i have done it so far is by using cross tabs to generate a probaility of, for e.g. a player with ranking 1000 - 1100 beating a player with ranking 700-800. This then proivdes a matrix of probabilities using the rank of winner as one axis and the rank of loser as the other. I then used least squares regression for each winner boundary and then for each loser boundary. This gives a linear formula which allows me to generate a probability based on a ranking of say 1147. I work out the probabilities of both player A beating player B based on the regression model which applies to player A's pre-match ranking and player A beating player B based on the regression model which applies to player B's pre-match ranking. Then i take the average of these two probabilities to come up with a 'true probability'.

Does it sound sensible?