Help on Analyzing Employee Engagement Survey and Test Type

#1
Hi All,

In advance thank you for the help on this question. I am a former PhD with a heavy quantitative background, but currently am in a management capacity in my current role. We are unrolling an employee engagement survey with 100+variables and in addition to building the descriptive portions, dashboards, etc. my study relies on some inferential analyses to determine among the variables what is leading to a lack of engagement. From the statistically significant findings, we are going to conduct focus groups to build our action plans.

The primary dependent variable will be a new ordinal variable called engagement, which is formed from 5 other variables and consists of the 5 point scale from Strongly Disengaged to Strongly Engaged

*Alternatively we could transform this into a linear variable on a scale from 0-25 (An aggregate of the 5 scores among each employee)

I'd like to run an ordered logistic regression on this and am wondering if anyone has a best approach to how to tackle the ordinal variables. Further, we could run a linear regression if the nature of the dependent variable is interval as states in * above.

We have a handful of categorical variables (10 or so) from our demographic data, a few interval variables such as tenure years of our employees etc.
Another concern is the 100 independent engagement variables all of which are on a 5 point Likert scale from Strongly Disagree to Strongly Agree and thus are ordinal. I've hard this approach can work if we treat the variables as interval 1-5, but don't have experience in this. Overall, with the categorical and interval variables we'd have around 110-115 variables.

A few direct questions I had.

-Do any folks have experience on which is the best approach with this? How are these types of engagement surveys typically analyzed? My hunch is they typically are not, because most organizations aren't in the inferential phase, but we'd like to do this with our survey.

-Is the number of independent variables too much for overfitting? I know some techniques to cut down the model size and improve fit, but would welcome others. We likely will reach around 60% or more of our population, which is around 2200 employees.

-Anyone have any studies on companies who have done this? I feel most consulting companies keep their methodologies pretty close but feel like someone may have done this.

-For R Users any useful R code my team and I can piggy back on for both ordered logistic or multiple linear regression?

-Welcome general comments/concerns as well.