Getting same value (.000) of Sig. (or 'p') in Shapiro-Wilk Normality test

#1
Greetings to fellow talk stat members, I'm facing an issue with significance (or 'p') value in Shapiro-Wilk Normality test. Even after performing various data transformations (like log etc.), the p value is constantly getting displayed as .000, any help is appreciated.

Thanks in advance.
 

Miner

TS Contributor
#2
What is your sample size? If your sample size is too large, it can fail a normality test yet appear very normal and be close enough to normal to meet assumptions for other tests. Try a normality plot to see whether there are major deviations from a straight line.

Another reason that you might see this is if you have a mixture of different populations. Again, a normality plot may help show whether this is an issue.
 

noetsi

No cake for spunky
#3
Please only do a thread on single board not multiple ones. It is possible that the transformations make no difference. But in addition SPSS only prints out so many 000's unless you change the default. .000001 might well appear as .000. You will find a 1 or other number eventually if you look at enough places. It may be that it has made a difference, but it is too small a difference to show up at the level of places you have requested.
 

Englund

TS Contributor
#4
Greetings to fellow talk stat members, I'm facing an issue with significance (or 'p') value in Shapiro-Wilk Normality test. Even after performing various data transformations (like log etc.), the p value is constantly getting displayed as .000, any help is appreciated.

Thanks in advance.
You could always drop the normality assumption and perform analyses that doesn't require normality.
 
#5
Thank you for the reply Miner, my sample contains 1097 rows of data, which I believe is not normalized after looking at the histogram. I have attached histogram, normal & detrended normal plot graphs, please have a look.

I'm a newbie in statistics so doesn't have much idea on how to proceed.
 
#6
Thank you for the reply Englund, actually I need to perform a time series analysis on the data, but if I'm proceeding without performing any normalization steps, the prediction are nowhere coming near to the data I have for cross-verification. Please help.
 
#7
Thank you for the reply noetsi, I definitely won't be repeating that mistake again :shakehead, & in the case you have mentioned, then also the p value will be less than 0.05 only right ? So we will have to reject the null hypothesis, that data is normal. And will have to make some transformation in the data I believe. Please help.
 
#9
yep, that's far from normal
why does it concern you that it's not normal, maybe because it's the error from a regression model?
Hi ted, thanks for your interest, actually I need to perform a time series analysis. If I'm proceeding with that without performing any transformation on the data, the predicted values are not proper. But when I'm applying the transformation I'm getting the same values (i.e. .000) for Shapiro-Wilk test which means that still their is some problem with the data. Pardon me if I'm missing something in my steps above as I'm completely new to statistics.
 
#10
sorry, I didn't see your previous post saying this was for time series. Not all time series analysis models require the noise part to be normal. Can you tell us more about what you're doing?
 
#11
sorry, I didn't see your previous post saying this was for time series. Not all time series analysis models require the noise part to be normal. Can you tell us more about what you're doing?
Its ok ted :) My requirement is to perform a time series analysis, but I'm not getting a proper set of values for the predicted ones, they are no where near to the data set which I have for cross-validation. Then I have started with the transformation step by applying logarithmic & square root operations separately, but for both the cases there is no change in the Significance value of Shapiro-Wilk test, everytime it is coming as .000. I'm not getting how to proceed after this.
 
#12
Hi pankaj12, forgive me if this is seems a bit redundant, but please let me just try to summarize and ask Q's to make sure I understand:

1) you performed a time series analysis -- can you breifly summarize?
2) you used your time series model to make predictions on a cross-validation dataset
3) the predictions on the cross-validation dataset do not match the actual values very well at all
4) you transormed the fitting dataset in at least two different ways and repeated 1)-3) above and did not see improvement

Please tell me if I have these steps incorrect. Where does the Shapiro-Wilk test fit in with your analysis flow? And what do the values represent that you put into this test (model errors, cross-validation errors, etc.)?
 
#13
Hi pankaj12, forgive me if this is seems a bit redundant, but please let me just try to summarize and ask Q's to make sure I understand:

1) you performed a time series analysis -- can you breifly summarize?
2) you used your time series model to make predictions on a cross-validation dataset
3) the predictions on the cross-validation dataset do not match the actual values very well at all
4) you transormed the fitting dataset in at least two different ways and repeated 1)-3) above and did not see improvement

Please tell me if I have these steps incorrect. Where does the Shapiro-Wilk test fit in with your analysis flow? And what do the values represent that you put into this test (model errors, cross-validation errors, etc.)?
Hi ted, you got everything correct, now coming on to your questions I'm just giving a brief explanation on what I did :

1) For time series analysis I have used SPSS Modeler's Expert Modeling option, without changing anything in the dataset provided to me. But the predicted values what I'm getting after this is no where near to dataset values provided for cross validation.

2) Later I searched on internet & got to know about data normalization through a youtube video. Then using SPSS Statistics I tried normalizing the data & on the same video it was mentioned that Shapiro-Wilk test is one of criteria from which it can be confirmed that the data is satisfying the condition of being normal.

3) I have performed the above steps, & thought that I can provide the transformed ( normalized) dataset in SPSS Modeler, but the Shapiro-Wilk test value in the above step is always coming .000 , which has completely stopped me from proceeding.

Please have look on the attached snapshot for the descriptive values I'm getting in SPSS Statistics.
 
Last edited:
#14
Thanks, that's helpful
can you confirm whether this is the correct documentation for the software you're using? I'm not familiar with SPSS, so hopefully someone with experience in it can chime in.

What I'm trying to figure out is exactly what model/method you're using, because not all time series analysis approaches require normal data. Despite what's sometimes taught in introductory-level courses, transforming non-normal data is rarely the answer. It seems either this is a bad model for the data (and thus why I'm trying to understand the model), or the cross-validation and model-fitting data aren't representative of one another. And there's always the possibility that the predictions on the cross-validation data weren't done correctly -- we've all done this before, including me.
 
#15
Thanks, that's helpful
can you confirm whether this is the correct documentation for the software you're using? I'm not familiar with SPSS, so hopefully someone with experience in it can chime in.

What I'm trying to figure out is exactly what model/method you're using, because not all time series analysis approaches require normal data. Despite what's sometimes taught in introductory-level courses, transforming non-normal data is rarely the answer. It seems either this is a bad model for the data (and thus why I'm trying to understand the model), or the cross-validation and model-fitting data aren't representative of one another. And there's always the possibility that the predictions on the cross-validation data weren't done correctly -- we've all done this before, including me.
Yes ted, the above link is only the documentation for IBM SPSS Modeler, ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/20.0/en/client/Manuals/IBM_SPSS_Statistics_Brief_Guide.pdf is the documentation for IBM SPSS Statistics. I'll explore more on this issue, if I'll find some solution for the same I'll definitely post here. Thanks for all your valuable inputs. :)
 

noetsi

No cake for spunky
#16
Normality gets less emphasis in time series I have seen - in some cases such as exponential smoothing it largely ignored. And that is one of the most common (and apparently accurate) time series used. Note that this is true of univariate models and if you have predictors it may matter more - I don't know.

Always remember that normality only effects the confidence intervals and p values nothing else.