I now have a similar but somewhat different problem, in that I'd need to model weighted average ratios. For example, let's say I have a dataset of employees; every employee has an allowance of available hours, and can dedicate a certain % of those to charity or similar tasks. I need to model the percentage of hours dedicated / hours available, but this time I am not so much interested in predicting specific individuals as in predicting the overall % of hours at an aggregate level. E.g. given that 10,000 employees in city A dedicated 10% of their hours, what % of their hours will the 5,000 employees in another city dedicate? The percentage will be a weighted average: if you have 2 hours available and I have 200, misclassifying me is a much more serious error than misclassifying you.

Any suggestions on how I could go about this? Also, the distribution of the observed % is very skewed: it is exactly 0 in about 70% of the cases, 100% in 20% of the cases, and anything in between in the remaining 10%.

I was thinking of using a GLM logit model, but this alone doesn't account for the fact that different observations should have different weights.

As for software, I use JMP by SAS (www.jmp.com) but have also access to Python and R.

Thank you!