Comparing Variable Importance for Sample Subset

#1
Hello all,

I am trying to figure out a way to compare the variable importance between a sample and a subsection of that sample using regression models.

I have a sample of survey responses (40k) and a subsection of that sample (3k) that are from a specific geographical location.
My team has worked on regression models that have been used to find which variables are most important in predicting the responses using the relative importance.

We want to use the same techniques to see if variables in the subsection (a certain geographical location) have higher statistical importance or lower statistical importance. Additionally, we would like to do this by calculating a z statistic so we can compare them on a set confidence interval.

Does anyone know a way to do this? I am using R and my models are built using the randomForest package.

Thanks for the help :)

Brian
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
So you are using randomforest with continuous dependent variable and you are calling that regression? How are your IVs formatted?
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
What is the purpose of the analyses? Why not fit a linear regression model? Splitting the continuous variable into buckets likely loses information and overfits data.