Did I do this correctly?

#1
Hi,

I am trying to forecast electric power demand for natural gas in the near future. A huge driver of electric power demand for natural gas on a monthly basis is weather, specifically Cold Demand Days when you need to use your air conditioning. I have weather forecasts for Cold Demand Days by month for next year.

Also, over the past 10 years, the annual increase in electric power demand for natural gas has risen 3-5% a year as power plants use more natural gas (and less coal), and I see this trend continuing.

So…I ran a multiple regression in Excel with my dependent variable as Natural Gas Used Per Day and my independent variables as 1) Cold Demand Days and 2) a Time Variable when starts at “1” in Jan-2001 and ends with “99” in Mar-2009.

Here are the results:

Regression Statistics
Adjusted R Square 0.90
Standard Error 1.41
Observations 99

Coefficients Standard Error t Stat
Intercept 9.71 0.31 30.88
Time Variable 0.05 0.00 9.77
CDD Variable 0.03 0.00 28.76


I am not a Stats expert, so I wanted to run this by you guys to see if everything made sense. Do you think there's anything I missed? Did I handle the Time Variable correctly?

Thank you very much!
 
#2
And a follow-up question: I'm also trying to run a regression for Residential power demand for natural gas. When I do the same type of multiple regression, my Time Variable only returns a T-Stat of -1.83. Would I be better off not including this variable?
 

CB

Super Moderator
#3
And a follow-up question: I'm also trying to run a regression for Residential power demand for natural gas. When I do the same type of multiple regression, my Time Variable only returns a T-Stat of -1.83. Would I be better off not including this variable?
What is the p value associated with the t statistic? (Sorry, I'm too lazy to lock it up myself :p) If the p value indicates that the coefficient is not "statistically significant" you *might* want to delete it - though also consider the actual size of the coefficient and how much it adds to the R-squared.

Overall your analysis seems to make sense, the adjusted R-squared in the initial analysis is very impressive (to a psychology researcher anyway - to us an R-squared of 0.30 for a 2 variable model is cause to jump with joy!)

In terms of handling the time variable, my main question would be whether the relationship between time and consumption was/is genuinely linear - remember that the regression can only capture linear effects. I.e. the general trend may have been 3-5% increases, but what I'd imagine there'd be in the data (without looking) would be large-ish increases in the early 00's, and then smaller increases or even decreases as the recession took hold.

As well as the time/linearity issue you probably need to check that the other assumptions of multiple regression are met (or not too badly violated). I'm not sure how easy this is in Excel - you may want to see if you can get access to SPSS, or another specialised statistics program.
 

CB

Super Moderator
#4
Have looked up p for a t value of -1.83 (df 2), 2-tailed p = 0.2087... yeah, definitely not "statistically significant", so you might want to leave the variable out - but again, consider how much it adds to the R-squared. If it's a lot and you have adequate theoretical/literature justification to include it, I think it'd be ok to leave it in. Null hypothesis testing isn't the be-all and end-all, after all! :eek:
 
#5
Great, thanks CowboyBear. I sincerely appreciate it.

Actually, according to the Excel regression, the p-value is 0.07 for the time variable...seems decent, no?
 

CB

Super Moderator
#6
Oh! Well, gosh, if the p value is 0.07... I would say that it's close enough to the usual alpha=0.05 for you to do what you feel is right (as long as there are other good reasons to have it in the model). :) A p value of 0.07 still indicates, after all, a pretty darn low chance that the observed effect would come up if the population effect was zero.