I'm struggling to decide how to normalise my data for modeling.
I am dealing with crowdfunded projects and a huge chunk of them (15%) raised $0 - $10 dollars, therefore failing. Those produce a very strong positive skew that is impossible to normalise. (tried log, z-scores, cubic). And will not adding value to my model.
Therefore I decided to remove them. (although these outliers are valid).
I tried Winsorizing them, but suspected that was wrong. Thus concluding that my only option is to trim them by dropping the top and bottom 10% of values.
Is this approach correct, or is there a better method?
Perhaps a non-parametric model..
I am dealing with crowdfunded projects and a huge chunk of them (15%) raised $0 - $10 dollars, therefore failing. Those produce a very strong positive skew that is impossible to normalise. (tried log, z-scores, cubic). And will not adding value to my model.
Therefore I decided to remove them. (although these outliers are valid).
I tried Winsorizing them, but suspected that was wrong. Thus concluding that my only option is to trim them by dropping the top and bottom 10% of values.
Is this approach correct, or is there a better method?
Perhaps a non-parametric model..