# Regression analysis across multiple data sets

#### Ianrbruce

##### New Member
Hi All. I'm new to this forum but I have a background in stats and in business/social science research.

I'm in the early stages of designing a research study that will analyze a large dataset for about 100 publicly traded companies. There will be about 10-20 independent variables for each company, each relating to business activities and performance metrics for a given month. I am trying to find out if any of these are predictive of business outcomes - revenue, stock price, etc.

I will have about 18 months of data for each company, showing the month-by-month results for each of the 10-20 variables, along with accompanying stock price, revenue data, etc.

I understand how to run regressions for each one of the 100 companies individually. My questions is this: how do I agglomerate or combine the results across ALL 100 companies, to get a overall result which will presumably be more powerful?

(Apologies if I'm not giving enough detail here, happy to answer any questions!)

#### ondansetron

##### TS Contributor
You'll want to look into a mixed model, then. Treat a company as a random effect.

What is the outcome of interest?

#### ondansetron

##### TS Contributor
might want to consider first dimension reduction or penalized regression since you have so many potential predictors.
I would start with theory (i.e. higher free cash flow to equity is theoretically associated with stock price being higher, all else constant, and FCF to the firm is associated with higher Value overall) to group variables into broad theoretical drivers of a particular outcome (i.e. industry type and leverage both tend to go together and might make stock prices more volatile, for example) , this way you can think broadly of the groups before jamming things in as predictors. Using principal components might be a good way to help reduce dimensions by either scoring the group to get an overall picture of value, let's say, or it can at least show you which few variables represent most of the information from the set. Those can then be used in trying to model the outcome.

Another approach would be to ignore all of that and use a LASSO or similar shrinkage/regularization procedure to select the variables for you while forcing some to zero (this is a good example of how to approach a large number of potential predictors).

#### Ianrbruce

##### New Member
You'll want to look into a mixed model, then. Treat a company as a random effect.

What is the outcome of interest?
My independent variables all relate to brand attributes like awareness, perception and preference. I'm presuming that some combination of these things predicts business outcomes later in time. I can associate some of these variables into these three classes to reduce the dimensions a bit.

My real question is how I look across all the companies, not one company at a time. Can you point me to a discussion of mixed models and how this might solve the problem?

#### Ianrbruce

##### New Member
might want to consider first dimension reduction or penalized regression since you have so many potential predictors.
I would start with theory (i.e. higher free cash flow to equity is theoretically associated with stock price being higher, all else constant, and FCF to the firm is associated with higher Value overall) to group variables into broad theoretical drivers of a particular outcome (i.e. industry type and leverage both tend to go together and might make stock prices more volatile, for example) , this way you can think broadly of the groups before jamming things in as predictors. Using principal components might be a good way to help reduce dimensions by either scoring the group to get an overall picture of value, let's say, or it can at least show you which few variables represent most of the information from the set. Those can then be used in trying to model the outcome.

Another approach would be to ignore all of that and use a LASSO or similar shrinkage/regularization procedure to select the variables for you while forcing some to zero (this is a good example of how to approach a large number of potential predictors).

Thanks. I understand how to reduce the set of independent variables (which are all brand attributes), based on either some hypothesis I have or some analysis of the data itself. My question is different: how do I analyze results across 100 companies, instead of across one company at a time? In other words, Company A, B, C, D E etc. all have associated stock price data, revenue etc. (one or some combination of these is my dependent variable). I am gathering data about brand attributes for each company. I know how to do a regression on each company in turn, but how do I look across ALL the companies to discern an relationship?