- Thread starter Jennifer Murphy
- Start date

A different approach using nonlinear regression.

It looks like a reciprocal quadratic (1 / (a + b(x-c)^2).

I tried using the curve-fitting trendlines in Excel, but they only offer a few options and it's up to me to guess by trial and error.

What are the green dotted lines?

That looks like a pretty good fit. How did you do that?

It looks like a reciprocal quadratic (1 / (a + b(x-c)^2).

I tried using the curve-fitting trendlines in Excel, but they only offer a few options and it's up to me to guess by trial and error.

What are the green dotted lines?

It looks like a reciprocal quadratic (1 / (a + b(x-c)^2).

I tried using the curve-fitting trendlines in Excel, but they only offer a few options and it's up to me to guess by trial and error.

What are the green dotted lines?

There is no real reason to expect that there is a defined mathematical distribution for your data.

Perhaps you could define it as a mixed distribution, say normal 50 to 100, and uniform above with weights to choose the distribution you use.

Perhaps you could define it as a mixed distribution, say normal 50 to 100, and uniform above with weights to choose the distribution you use.

A normal distribution seems wrong for that reason but there are others. It doesn't look symmetric even after accounting for the "truncation". It's a discrete distribution so something like negative binomial might work better but still with the process you're describing I would not expect negative binomial to work great either. The truncated normal you fit looks to provide way too large of estimates in the 82-102 range and then give essentially zero weight for values above that but those aren't exactly super rare events.

There are a few reasons. I have more but hopefully that's good enough to convince you I'm not just talking out of my ass.

Now I "challenge" you to give reasons why you can't just use the empirical distribution here. Heck apply some sort of discrete smoothing if you want. But I agree with others that I don't at a reason a common statical distribution would apply to your case. It could be a big mixture distribution work parameters based on the player and the kind of day they're having - but you'll never be able to fit that properly so why not just go the easy route?

Are the observations independent - if not you are allowing A person to vote twice and their votes are likely correlated - that seems to be an issue for the normal or any dist right?

So we have three major variables: (1) The difficulty of the deal. (2) The skill of the player. (3) The mental state of the player.

I contend that if we have 1,000 people play a particular deal (constant difficulty), the number of moves they take would follow a normal curve. Smarter players would take fewer moves. I contend that if there were a way to have a particular player play a particular deal 1,000 times

Each player will gain skill as they play more games. That can be thought of as a different player. I contend that a large number of players playing a large number of games will approach a normal curve. It may not be

Is there a reason you're super concerned with comparing the actual distribution of the player's results versus the "theoretical" distribution (which I contend doesn't actually exist unless you're specifying a specific algorithm or something to compare against)? Why not just compare means/medians if you want to know if somebody is doing better/worse than expected?

Or find the ranking in all the results accumulated up to that time?