Normal? Bimodal normal? Some other distribution?

#1
Hello! I am a newbie here. First I want to say HI to everyone! :)

And I do have a question here.

I am trying to model my data that ideally is expected to follow a normal distribution. But I am suspecting it might be bimodal. Well, I am having trouble posting a picture here, so please bear with me as next I am going to describe to you how the histogram plot of the data looks like.

Imagine the horizontal axis has values ranging from -6 to 5. The first peak (about 3.3%) was reached at about -3.7 and the frequency drops until it reaches 1.6% at about 0. Then the frequency goes up slightly and reaches 1.8% (the second peak) at about 0.5. Then the histogram continues like one for a normal distribution.

Now my questions are:
Can I treat the data with a histogram like this as normal? If not, do I have to perform a multimodality test and how? IF the test shows it is bimodal can I do anything to separate the two normal distributions (and then treat each of them individually)? If it might be fitted by some other distribution, what could it be??

Thanks for your attention and I am looking forward to hearing your expert comments!
 
#2
I'm no expert but have given your questions some thought. Here's my $0.02:

I presume you want to model your data to derive some statistics or probabilities based on the sample you have collected. From your description your data does not appear to be normally distributed - there are a number of tests to confirm this. I am unaware of a multi-modal distribution that will "fit" your dataset as described - but why not just generate one yourself. The idea behind any statistical distribution is that the area beneath the probability density curve is one. You could generate a model with multiple humps that correspond to the peaks in your dataset e.g. a fourth order polynomial function could work. The integral of this function over the range of your dataset must be such that it equals 1. You can then use this function as the statistical distribution of your data.
 

Masteras

TS Contributor
#3
if you use SPSS do a normality test Analyze>non-paramteric tests>K-S test and put the variable in, click ok and there you are.
 
#4
I'm no expert but have given your questions some thought. Here's my $0.02:

I presume you want to model your data to derive some statistics or probabilities based on the sample you have collected. From your description your data does not appear to be normally distributed - there are a number of tests to confirm this. I am unaware of a multi-modal distribution that will "fit" your dataset as described - but why not just generate one yourself. The idea behind any statistical distribution is that the area beneath the probability density curve is one. You could generate a model with multiple humps that correspond to the peaks in your dataset e.g. a fourth order polynomial function could work. The integral of this function over the range of your dataset must be such that it equals 1. You can then use this function as the statistical distribution of your data.
Thanks! I understand your point, but what if the data I plotted have been transformed once, say from y -> ln(y)? Also, how would the nonlinear distribution of dependent variable affects the prediction model which involves independent variables?
 
Last edited:
#5
I can't see that it would make a difference but I don't really know. I suggest testing/comparing against one or better two normal distributions representing your bimodal data.