problems with panel data model - correlations & time lags


New Member
Hello everyone!

I will try to be precise and concise.

I'm studying the impact of environmental-related technologies on the CO2 emissions reduction (dependent variable). To do so, I'm working on a panel data: 27 countries and 19 years (2000 to 2018). My panel is in a long format.
I have 19 independent variables:
  • 9 subsets of patent applications which are highly correlated with each other because some technologies are part of several subsets
  • 9 specialization indexes: one for each of these subsets
  • 1 concentration index
And I have 5 control variables.
I am considering running a fixed effects or a random effects model depending on the Hausman test. But first, I have some issues.

***** PROBLEMS******
1° I did a pairwise correlation, and some correlations are not statistically significant. Is it a problem?

2°As the high correlation within my independent variables, I will run 9 models for the 9 different subsets. Do you advise testing in a single model the variables concerning only the same technology, i.e. in the first model test the patent counts of subset 1, the specialization of subset 1 and the concentration; or test the patent number of subset 1 with all the other independent variables (i.e. the 9 specialization indexes and the concentration index)? When testing both, the estimates are obviously different. Do you know how I could determine this?

3° I would like to lag the patent variables i.e. considering the effect of patents in e.g. year t-3 on the GHG emissions in year t? Can I do that by adding '3-year period' time dummies? If so, how? When I add a 'one-year' time dummy, some of these dummies have resulted in multicollinearity.

The image below is the example of the 'first difference' output, considering patents of subset_1 with all other variables and the one-year dummies.

Thank you so much for your help!
Capture d’écran 2021-01-03 à 19.03.27.png Capture d’écran 2021-01-01 à 21.57.26.png