Dear all,
I have approximated two one-dimensional random variables \( A \) and \( B \) using Gaussian Mixture Models (3 gaussians). I used gmdistribution.fit Matlab function with 10000 values. The resulting distributions are called \( f_{A}(x) \) and \( f_{B}(x) \) (shown in attached Figure 1, being \( f_{A}(x) \) in red and \( f_{B}(x) \) in blue).
Now I have a few values (e.g. v=[-1 0 0.5 1 5 6 6.5 7 14], as in Figure 1). This vector produces a very sparse histogram, since there are not many values.
How probable is that these values were generated by \( f_{A}(x) \) distribution? or How probable is that these values were generated by \( f_{B}(x) \) distribution? I would like to obtain a probability value in order to classify the set of values to category A (\( f_{A}(x) \) distribution) or category B (\( f_{B}(x) \) distribution).
I had several ideas:
-> Joint distribution (product of probabilities)... but, what happens with outliers? Since \( f_{A}(x) \) and \( f_{B}(x) \) are approximations of the real distribution, a outlier might produce zero probability for some value of x (see attached Figure 2). So I am not sure about this way.
-> Average probability: This is a trivial solution I though, and probably it's wrong.
-> Hypothesis tests (Squared Chi, or Kolmogorov–Smirnov): In the case of squared Chi, data should be binned... what is the optimal size of these bins? In addition, these hypothesis tests produce a p-value, which is not the probability value I am looking for (as far as I understood).
Thanks in advance for your advice.
Best regards,
Emliio.
I have approximated two one-dimensional random variables \( A \) and \( B \) using Gaussian Mixture Models (3 gaussians). I used gmdistribution.fit Matlab function with 10000 values. The resulting distributions are called \( f_{A}(x) \) and \( f_{B}(x) \) (shown in attached Figure 1, being \( f_{A}(x) \) in red and \( f_{B}(x) \) in blue).
Now I have a few values (e.g. v=[-1 0 0.5 1 5 6 6.5 7 14], as in Figure 1). This vector produces a very sparse histogram, since there are not many values.
How probable is that these values were generated by \( f_{A}(x) \) distribution? or How probable is that these values were generated by \( f_{B}(x) \) distribution? I would like to obtain a probability value in order to classify the set of values to category A (\( f_{A}(x) \) distribution) or category B (\( f_{B}(x) \) distribution).
I had several ideas:
-> Joint distribution (product of probabilities)... but, what happens with outliers? Since \( f_{A}(x) \) and \( f_{B}(x) \) are approximations of the real distribution, a outlier might produce zero probability for some value of x (see attached Figure 2). So I am not sure about this way.
-> Average probability: This is a trivial solution I though, and probably it's wrong.
-> Hypothesis tests (Squared Chi, or Kolmogorov–Smirnov): In the case of squared Chi, data should be binned... what is the optimal size of these bins? In addition, these hypothesis tests produce a p-value, which is not the probability value I am looking for (as far as I understood).
Thanks in advance for your advice.
Best regards,
Emliio.