How do I validate my model in the absence of "measured or actual data"?

OjaP

New Member
#1
I am working on electricity consumption forecast for a developing country and built a model to estimate previous and future, expected electricity consumption. I want to validate this model but I have no historical, measured consumption data for this country to compare my modeled results with. Given this problem of no measured or actual data; how else can I validate this model?

Thanks for your help.
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
That is what I assumed, but could not understand why you did not hold some of those data out for model testing and validation. You can have your cake and eat it too.

Perhaps if you are very savvy you could create synthetic data based on other countries to also examine performance, but assumption would be needed.
 

OjaP

New Member
#5
That is what I assumed, but could not understand why you did not hold some of those data out for model testing and validation. You can have your cake and eat it too.

Perhaps if you are very savvy you could create synthetic data based on other countries to also examine performance, but assumption would be needed.


The bolded seems interesting. How do I go about creating a synthetic data based on other countries to examine performance. And what kind of assumptions would be needed?
 

hlsmith

Less is more. Stay pure. Stay poor.
#7
Synthetic controls are becoming very popular for interrupted time series. So you have data for say Los Angelos across time they have been interruption (e.g., covid, policy, etc.). You would look at the change in rate in LA vs. another city that did not have the interruption. However, there is no city similar to LA, but you could weight Sacramento, Phoenix, and Oakland in combination to match the characteristics of LA and compare it's rate change or lack of rate change as a comparator to LA. I would imagine you could find a comparable country or combination of other countries similar to your target and see how the model performed on it.

I have not used this method yet, but will be taking the following workshop to learn more about it.

Workshop Details
July 9, 2021
10:00am – 2:00pm MT
Instructors:
Roch Nianogo
Tarik Benmarhnia
Target Audience: Beginner


An introduction to Difference-in-Differences and Synthetic Control Methods for Epidemiologists
The interest in and use of quasi-experimental methods to evaluate the impact of a health policy or program on some disease or outcome of interest has drastically increased in the epidemiological literature. Some designs exploit the specific timing and place of an intervention implementation as a natural experiment. In this context, difference-in-differences, interrupted time series, and recent synthetic control methods have been used. In this Read less
workshop, we propose an overview of different quasi-experimental methods covering the historical context, the identification assumptions under the potential outcomes framework, and the different steps to implement such methods using various case studies. This workshop will introduce the theory and practice on the what, why, and how to implement Difference-in-Differences and Synthetic Control Methods in R/Rstudio. Attendees will work individually on hands-on programming exercises.

Offered at the society for epidemiologic research conference.
 

OjaP

New Member
#8
Synthetic controls are becoming very popular for interrupted time series. So you have data for say Los Angelos across time they have been interruption (e.g., covid, policy, etc.). You would look at the change in rate in LA vs. another city that did not have the interruption. However, there is no city similar to LA, but you could weight Sacramento, Phoenix, and Oakland in combination to match the characteristics of LA and compare it's rate change or lack of rate change as a comparator to LA. I would imagine you could find a comparable country or combination of other countries similar to your target and see how the model performed on it.

I have not used this method yet, but will be taking the following workshop to learn more about it.

Workshop Details
July 9, 2021
10:00am – 2:00pm MT
Instructors:
Roch Nianogo
Tarik Benmarhnia
Target Audience: Beginner


An introduction to Difference-in-Differences and Synthetic Control Methods for Epidemiologists
The interest in and use of quasi-experimental methods to evaluate the impact of a health policy or program on some disease or outcome of interest has drastically increased in the epidemiological literature. Some designs exploit the specific timing and place of an intervention implementation as a natural experiment. In this context, difference-in-differences, interrupted time series, and recent synthetic control methods have been used. In this Read less
workshop, we propose an overview of different quasi-experimental methods covering the historical context, the identification assumptions under the potential outcomes framework, and the different steps to implement such methods using various case studies. This workshop will introduce the theory and practice on the what, why, and how to implement Difference-in-Differences and Synthetic Control Methods in R/Rstudio. Attendees will work individually on hands-on programming exercises.

Offered at the society for epidemiologic research conference.

Thanks for this. I appreciate.
 
#9
You might want to look at intervention models in ARIMA (Hays et el covered this) or probably better segmented regression which has an intervention dummy and for which you can control for factors since it is regression. Interrupted time series is not really a statistical method.