Is there a good tool for finding the best curve to fit my data?

#22
A different approach using nonlinear regression.
That looks like a pretty good fit. How did you do that?

It looks like a reciprocal quadratic (1 / (a + b(x-c)^2).

I tried using the curve-fitting trendlines in Excel, but they only offer a few options and it's up to me to guess by trial and error.

What are the green dotted lines?
 

hlsmith

Less is more. Stay pure. Stay poor.
#23
Are the observations independent - if not you are allowing A person to vote twice and their votes are likely correlated - that seems to be an issue for the normal or any dist right?
 

katxt

Well-Known Member
#24
There is no real reason to expect that there is a defined mathematical distribution for your data.
Perhaps you could define it as a mixed distribution, say normal 50 to 100, and uniform above with weights to choose the distribution you use.
 

Miner

TS Contributor
#25
That looks like a pretty good fit. How did you do that?

It looks like a reciprocal quadratic (1 / (a + b(x-c)^2).

I tried using the curve-fitting trendlines in Excel, but they only offer a few options and it's up to me to guess by trial and error.

What are the green dotted lines?
I used a nonlinear regression in Minitab. The function used is the Holliday model, a yield density function used in agriculture. The green lines are the prediction intervals.
 

Dason

Ambassador to the humans
#26
There is no real reason to expect that there is a defined mathematical distribution for your data.
Perhaps you could define it as a mixed distribution, say normal 50 to 100, and uniform above with weights to choose the distribution you use.
Yeah. Or just use the empirical distribution derived from the data. Using a continuous distribution can be a useful approximation in some situations I still don't see a need for it here.

A normal distribution seems wrong for that reason but there are others. It doesn't look symmetric even after accounting for the "truncation". It's a discrete distribution so something like negative binomial might work better but still with the process you're describing I would not expect negative binomial to work great either. The truncated normal you fit looks to provide way too large of estimates in the 82-102 range and then give essentially zero weight for values above that but those aren't exactly super rare events.

There are a few reasons. I have more but hopefully that's good enough to convince you I'm not just talking out of my ass.

Now I "challenge" you to give reasons why you can't just use the empirical distribution here. Heck apply some sort of discrete smoothing if you want. But I agree with others that I don't at a reason a common statical distribution would apply to your case. It could be a big mixture distribution work parameters based on the player and the kind of day they're having - but you'll never be able to fit that properly so why not just go the easy route?
 
#27
Are the observations independent - if not you are allowing A person to vote twice and their votes are likely correlated - that seems to be an issue for the normal or any dist right?
The observations are not strictly independent as with a coin toss trial. The constant is the game. There are 52! possible deals. The total number of unique games is substantially less than that because, for example, the order of the 24 cards in the draw pile is irrelevant, so the actual number is somewhat less than 52!/24!. And a fair number of games (about 15%, I believe) are unwinnable. Of those that are winnable, they vary in difficulty.

So we have three major variables: (1) The difficulty of the deal. (2) The skill of the player. (3) The mental state of the player.

I contend that if we have 1,000 people play a particular deal (constant difficulty), the number of moves they take would follow a normal curve. Smarter players would take fewer moves. I contend that if there were a way to have a particular player play a particular deal 1,000 times without any memory of any previous tries, that would also follow a normal curve. Players have good days and bad days.

Each player will gain skill as they play more games. That can be thought of as a different player. I contend that a large number of players playing a large number of games will approach a normal curve. It may not be perfect from a statistically theoretical perspective, but close enough to be useful. One of the ways that I believe it would be useful is to be able to compare a player's results against the curve.
 

Dason

Ambassador to the humans
#28
I contend that you should at the absolute very least say "approximate normal" because there is literally no way it can be normally distributed. It also probably isn't even symmetric to be honest so I think you're just saying what you *want* to happen because the normal distribution is easy to work with.

Is there a reason you're super concerned with comparing the actual distribution of the player's results versus the "theoretical" distribution (which I contend doesn't actually exist unless you're specifying a specific algorithm or something to compare against)? Why not just compare means/medians if you want to know if somebody is doing better/worse than expected?
 

Dason

Ambassador to the humans
#30
Or find the ranking in all the results accumulated up to that time?
Sure! Essentially just using the empirical distribution as I suggested before. I just really still don't see why without a rigorous mathematical theoretical distribution to compare against they want to do something other than compare the mean or take a percentile ranking of some sort.