# Is there a good tool for finding the best curve to fit my data?

#### Miner

##### TS Contributor
A different approach using nonlinear regression.

#### Jennifer Murphy

##### Member
A different approach using nonlinear regression.
That looks like a pretty good fit. How did you do that?

It looks like a reciprocal quadratic (1 / (a + b(x-c)^2).

I tried using the curve-fitting trendlines in Excel, but they only offer a few options and it's up to me to guess by trial and error.

What are the green dotted lines?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Are the observations independent - if not you are allowing A person to vote twice and their votes are likely correlated - that seems to be an issue for the normal or any dist right?

#### katxt

##### Well-Known Member
There is no real reason to expect that there is a defined mathematical distribution for your data.
Perhaps you could define it as a mixed distribution, say normal 50 to 100, and uniform above with weights to choose the distribution you use.

#### Miner

##### TS Contributor
That looks like a pretty good fit. How did you do that?

It looks like a reciprocal quadratic (1 / (a + b(x-c)^2).

I tried using the curve-fitting trendlines in Excel, but they only offer a few options and it's up to me to guess by trial and error.

What are the green dotted lines?
I used a nonlinear regression in Minitab. The function used is the Holliday model, a yield density function used in agriculture. The green lines are the prediction intervals.

#### Dason

There is no real reason to expect that there is a defined mathematical distribution for your data.
Perhaps you could define it as a mixed distribution, say normal 50 to 100, and uniform above with weights to choose the distribution you use.
Yeah. Or just use the empirical distribution derived from the data. Using a continuous distribution can be a useful approximation in some situations I still don't see a need for it here.

A normal distribution seems wrong for that reason but there are others. It doesn't look symmetric even after accounting for the "truncation". It's a discrete distribution so something like negative binomial might work better but still with the process you're describing I would not expect negative binomial to work great either. The truncated normal you fit looks to provide way too large of estimates in the 82-102 range and then give essentially zero weight for values above that but those aren't exactly super rare events.

There are a few reasons. I have more but hopefully that's good enough to convince you I'm not just talking out of my ass.

Now I "challenge" you to give reasons why you can't just use the empirical distribution here. Heck apply some sort of discrete smoothing if you want. But I agree with others that I don't at a reason a common statical distribution would apply to your case. It could be a big mixture distribution work parameters based on the player and the kind of day they're having - but you'll never be able to fit that properly so why not just go the easy route?

#### Jennifer Murphy

##### Member
Are the observations independent - if not you are allowing A person to vote twice and their votes are likely correlated - that seems to be an issue for the normal or any dist right?
The observations are not strictly independent as with a coin toss trial. The constant is the game. There are 52! possible deals. The total number of unique games is substantially less than that because, for example, the order of the 24 cards in the draw pile is irrelevant, so the actual number is somewhat less than 52!/24!. And a fair number of games (about 15%, I believe) are unwinnable. Of those that are winnable, they vary in difficulty.

So we have three major variables: (1) The difficulty of the deal. (2) The skill of the player. (3) The mental state of the player.

I contend that if we have 1,000 people play a particular deal (constant difficulty), the number of moves they take would follow a normal curve. Smarter players would take fewer moves. I contend that if there were a way to have a particular player play a particular deal 1,000 times without any memory of any previous tries, that would also follow a normal curve. Players have good days and bad days.

Each player will gain skill as they play more games. That can be thought of as a different player. I contend that a large number of players playing a large number of games will approach a normal curve. It may not be perfect from a statistically theoretical perspective, but close enough to be useful. One of the ways that I believe it would be useful is to be able to compare a player's results against the curve.

#### Dason

I contend that you should at the absolute very least say "approximate normal" because there is literally no way it can be normally distributed. It also probably isn't even symmetric to be honest so I think you're just saying what you *want* to happen because the normal distribution is easy to work with.

Is there a reason you're super concerned with comparing the actual distribution of the player's results versus the "theoretical" distribution (which I contend doesn't actually exist unless you're specifying a specific algorithm or something to compare against)? Why not just compare means/medians if you want to know if somebody is doing better/worse than expected?

#### katxt

##### Well-Known Member
Or find the ranking in all the results accumulated up to that time?

#### Dason

Or find the ranking in all the results accumulated up to that time?
Sure! Essentially just using the empirical distribution as I suggested before. I just really still don't see why without a rigorous mathematical theoretical distribution to compare against they want to do something other than compare the mean or take a percentile ranking of some sort.

#### Dason

Maybe this one will suit your needs:
www.lerenisplezant.be/fitting.htm
That just looks awful and the "What is different from other programs" section just makes it sound like the creator doesn't really understand statistics or at the very least is not good at explaining things. It sounds more like "let's try to convince people that don't know anything that we're great" as opposed to "let's be great and give good evidence of that". I could be wrong - but I just don't get a good impression from that site.

Normally I'd just straight up remove as spam but since it appears you're the author I'll allow you to respond before removing.

#### Koen Van de moortel

##### New Member
That just looks awful and the "What is different from other programs" section just makes it sound like the creator doesn't really understand statistics or at the very least is not good at explaining things.
What do you mean by that??? I definitely have a lot of experience with regression! I'm just explaining people why they should try this. There are so many other programs for regression analysis, but this one has a new algorithm, so what else could I do to catch people's attention? Any constructive suggestion is appreciated.

#### GretaGarbo

##### Human
I tried a few distributions from the fitdistrplus package. They did not fit so well. So far, the best suggestion, as I understand it, is the suggestion from @Dason, to use the empirical distribution function.

@Koen Van de moortel , you suggest your own program.
I'm just explaining people why they should try this.
Then I suggest that @Koen Van de moortel use that program and show that it gives a good fit to these data.

It is nice that people bring in new models and suggestions. But @Koen Van de moortel, to be frank, I am not so impressed by the site. By the way, the program seems to be more about various linear and nonlinear regression methods and not som much about finding a distribution.

but this one has a new algorithm
What is the new algorithm? The site talked about least squares, but that is not so new.

(2) The skill of the player. (3) The mental state of the player.
So, @Jennifer Murphy, it seems like the data is about player's skill, and the amount of training. So there are a few omitted relevant explanatory variables here. Then it is difficult to get a good fit.

An other possibility is to use "Distributions for Modeling Location, Scale and Shape" in hereand here. They have more than 100 distributions to select from.

#### Koen Van de moortel

##### New Member
@GretaGarbo Yes, the software is meant to find the best fitting parameters for a number of models, by iteration and taking measurement imprecisions into account (most of the others just neglect that). In the current version, you can also find the best fitting Gauss distribution or a mix of two of those. If you would read what is said, you would see the link to the article that describes my innovation:
http://www.physicsjournal.net/article/view/30/3-3-11

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Question, if it is a new algorithm - have you published on it in a peer-reviewed journal. Perhaps conducting a vast simulation study versus other approaches. Ideally when doing this, you could outsource the data generating function for the creation of samples to another person to help ensure DGFs were not cherry-picked.

#### Koen Van de moortel

##### New Member
Question, if it is a new algorithm - have you published on it in a peer-reviewed journal. Perhaps conducting a vast simulation study versus other approaches. Ideally when doing this, you could outsource the data generating function for the creation of samples to another person to help ensure DGFs were not cherry-picked.
It's published in the above mentioned journal, and I can assure you that I have done extensive research to test it.
See: https://www.researchgate.net/profile/Koen-Van-De-Moortel/research