Which regression model to use with skewed & ordinal data?

#1
I'm a bit stucked and just wondering about which regression technic to use with my dependent variable at hand. Respondents were asked to provide a particular percentage value from 0-100. Sadly, these percentages are not accessible in the dataset, but were ranked into nine categories (0-8). The categories are thought of representing a performance measure, ranging from 0 ('low performer') to 8 ('highest performer'). To notice, the categories do consist of different value ranges, so not really even intervals. In general, I'm interested in predicting the effect of some ordinal IVs on performance. Referring to the descriptive statistics / frequencies, my DV appears to be heavily right skewed. Particularly, one third of all observations (N=400) is attributed to 0, whereas the rest appears to be bell shaped.

Many many thanks in upfront!

Chris
 

noetsi

No cake for spunky
#2
You can try ordinal logistic regression although with 8 levels you may not meet its assumptions. There is a test of the special assumption of ordinal regression (which from memory is that the model is working the same way at each level). I can find the output you look at to see if the assumption is met in SAS if that helps. If you are not using SAS you can probably find a similar output in other software. Normally with 8 levels you might find that your ordinal and linear regression generate similar results. At a certain point, that is a certain number of distinct levels, ordinal logistic regression begins to break down I have been told.

Another option is multinomial logistic regression which only assumes nominal data. You lose some information, but a number of the problems that exist in ordinal logistic regression do not apply.
 
#3
@noetsi
Thanks a lot.
The ordinal logistic regression seems preferable, but the crux is to find if the assumptions of proportional odds are met...
Any idea on how to proof them? Is there an indicator you know of?

I've just spss at hand...
 

noetsi

No cake for spunky
#4
SPSS calls it (the test of proportional odds) the Test of Parallel Lines I believe. Go to the bottom of this link to see it described

http://www.ats.ucla.edu/stat/spss/output/ologit.htm

Allison cautions that this test (at least in SAS) tends to reject the null (which means the assumption is violated) more often than it should. And things like adding or deleting variables or respecifying variables can change the result of this test as well. Which is a pain...

To correct something I said earlier, ordinal regression is not harder to interpret as the number of levels goes up - but you need more cases to estimate the model correctly (allison says 10 per level of the DV ignoring the issue of how much data you need generally to correctly use logistic regression).

Its actually multinominal regression that is harder to interpret as the number of levels increase although it does not have the assumption of proportional odds