One Column of Dates, Two Different Formats

#1
In excel, all startdate observations are in a similar format:
Code:
"2006-10-10 10:10:10"
When the file is loaded into stata, some observations are in one format and others are in a different format (the overall startdate variable is a string):
Code:
"2006-10-10 10:10:10" or "10/10/2006 10:10"
I thought that the stata date functions would take care of this. So, I converted date into a numeric value. As I only needed the month and year of observation I did this:
Code:
gen startmonth = mofd(date(startdate, "YMDhms"))
format %tm startmonth
In the end, this doesn't seem to work for half the observations!
"2006-10-10 10:10:10" becomes "2006m10"
"10/10/2006 10:10" becomes "-" (or null)

What I've discovered is that a null value is returned when I convert the (string) startdate into a numeric value.


What can I do so that all observations in the second format ("10/10/2006 10:10:10") become "2006m10"?
 

bukharin

RoboStataRaptor
#2
It sounds like the problem lies in the conversion from Excel to Stata (or possibly the way the dates are stored in Excel). How are you doing the conversion?

In the meantime a quick workaround would be:
Code:
gen startmonth = mofd(date(startdate, "YMDhms"))
replace startmonth = mofd(date(startdate, "MDYhm")) if missing(startmonth)
(or perhaps it should be DMY - that's the problem with the 10/10/2010 format)
 
#3
bukharin, you are amazing. Thanks!

Although the conversion problem may be in excel, my file is 500mb - too big to load/change all the values. I'm doing the conversion through Stata import.