# How well a few values fit a given distribution?

#### eelioss

##### New Member
Dear all,

I have approximated two one-dimensional random variables $$A$$ and $$B$$ using Gaussian Mixture Models (3 gaussians). I used gmdistribution.fit Matlab function with 10000 values. The resulting distributions are called $$f_{A}(x)$$ and $$f_{B}(x)$$ (shown in attached Figure 1, being $$f_{A}(x)$$ in red and $$f_{B}(x)$$ in blue).

Now I have a few values (e.g. v=[-1 0 0.5 1 5 6 6.5 7 14], as in Figure 1). This vector produces a very sparse histogram, since there are not many values.

How probable is that these values were generated by $$f_{A}(x)$$ distribution? or How probable is that these values were generated by $$f_{B}(x)$$ distribution? I would like to obtain a probability value in order to classify the set of values to category A ($$f_{A}(x)$$ distribution) or category B ($$f_{B}(x)$$ distribution).

I had several ideas:

-> Joint distribution (product of probabilities)... but, what happens with outliers? Since $$f_{A}(x)$$ and $$f_{B}(x)$$ are approximations of the real distribution, a outlier might produce zero probability for some value of x (see attached Figure 2). So I am not sure about this way.

-> Average probability: This is a trivial solution I though, and probably it's wrong.

-> Hypothesis tests (Squared Chi, or Kolmogorov–Smirnov): In the case of squared Chi, data should be binned... what is the optimal size of these bins? In addition, these hypothesis tests produce a p-value, which is not the probability value I am looking for (as far as I understood).

Thanks in advance for your advice.
Best regards,
Emliio.

#### eelioss

##### New Member
Any idea?

My questions are:

How to measure the goodness of fit between a few samples and a non-gaussian distribution?

Given two possible distributions: What is the probability that some few samples are generated by each of them?

Thanks,
Emilio.