independent variable that is a distribution?


New Member

I am attempting a logistic regression analysis with a binary response and explanatory variables measured at the individual level and also some measured within subgroups of the observations (nests). For each nest containing individual observations I know the distribution of income in the form of a histogram (but not the individual incomes). I could enter each histogram value as a separate variable (i.e. percent in class 1) but I would rather fit a distribution to each histogram for each nest. Then, similar to a regression with random effects (but in a sense the diametric opposite), a parameter estimate would be assigned to this variable. Analogous to integration over multiple versions of the random effect to find the one which maximizes likelihood, this would integrate over each known distribution for each nest when estimating the parameter estimate attached to it and other independent variables. I am wanting to do this because I believe that income interacts with the relationships between other independent variables and my dependent variable but do not know the actual income, only the distribution over a subset of observations. In doing so, I'm wanting to partially overcome what is called the ecological fallacy. Is this approach a known method? I have not run across it. It seems more computationally intensive than the random effects model as you would have to estimate all parameters in the model and by integration over the pre-specified distributions for each nest. At least the distribution will not also have to be estimated as well. Any food for thought on other ways to overcome the ecological fallacy (correlation)? Thanks. -Seth
Last edited: