creating dummy variable

mmmm

New Member
#1
Hi,

I need some help with Stata. I have data set about several thousands of persons and I want to create for each person & each year since 1950 a dummy variable which indicates whether a person possesses a car or not at that particular year.

The data include
• whether a person has ever owned a car
• year of car purchase
• whether a person still owns the car
• reasons for not-possesing (such as accident, sell, etc.)
• year of the accident, sell, etc,.

Could someone please help me to generate the desired results?

regards
mmm
 

bukharin

RoboStataRaptor
#2
It would help to post the structure of your data.

Also, do you want a whole bunch of variables like car1950, car1951 etc, or do you want the data in "long" format (person id, year, car)?
 

mmmm

New Member
#3
I would like to have the data in the long format.

Do you mean by structure:
1. a dummy variable whether a person has ever owned a car
2. if the answer to 1. is "yes", then the next variable contains the year of car purchase
3. a dummy variable whether a person still owns the car
4. if the anwser to 3. is "yes", then the next variable adopts one of 4 values each indicating a reason for not-possesing the car anymore (such as accident, sell, etc.)
5. the last variable contains the year when the accident, sell, etc,. happened
 

bukharin

RoboStataRaptor
#4
Okay let's assume your data is structured as follows:
Code:
id  everowned  yearbought  stillowns  reasondisposed  yeardisposed
1   1          1975        0          3               1998
Now you're going from 1950 to (say) 2011, which means you have 2011-1950+1 = 62 years of data. So you need 62 entries per person in long format. Then you could try something like:
Code:
expand 62
bysort id: gen year=_n + 1949
gen hascar=year>=yearbought & (year<=yeardisposed | stillhas==1) if !missing(yearbought, yeardisposed)
That will only work if each person is only in the dataset once (ie only has one car). It gets a little more complex if they can have 2 or more cars (but is of course still possible).
 

mmmm

New Member
#5
Thank you very much for your answer. I'd like to make sure that I understand the code:

First, we generate a year-variable for each individual for each year since 1950.
Second, we generate the dummy-variable of possessing a car for each individual in each year
-if the year-variable is equal or larger than the year of purchase
-and if the year-variable is equal or smaller than the year of disposal given that the car is not owned anymore, i.e. that the "stillowns"-variable is equal to 1

Unfortunately, i don’t get the last part "if !missing(yearbought, yeardisposed)". Could you please explain its meaning?

Moreover, the data also include information about a second car purchase & disposal. Do you have an idea how the code can be adapted?
 

bukharin

RoboStataRaptor
#6
Code:
gen hascar=year>=yearbought & (year<=yeardisposed | stillhas==1)
This will create a dummy variable equal to the logical (true/false) result of "year>=yearbought & (year<=yeardisposed | stillhas==1)"

Sometimes you get tripped up by missing data. Let's say you have "missing" for yearbought and yeardisposed. The expression will evaluate as false, so "hascar" will be made 0. However, you don't actually know they didn't have a car - that data is missing. So, "hascar" should also be missing. The "if !missing..." statement means that "hascar" will only be given a value if that observation has data for yearbought and yeardisposed.

In terms of adapting the code it's pretty easy but depends on your data structure.

Let's say you have this:
Code:
id  everowned  yearbought  stillowns  reasondisposed  yeardisposed
1   1          1975        0          3               1998
1   1          1999        0          3               2004
You can do the same trick, but for each car; then combine the years for each person as follows:
Code:
bysort id: gen carnum=_n // each id can have multiple cars
expand 62
bysort id carnum: gen year=_n + 1949 // look at each year for each combo of id + carnum
gen hasthiscar=year>=yearbought & (year<=yeardisposed | stillhas==1) if !missing(yearbought, yeardisposed) // had that particular car in that year
egen hascar=max(hasthiscar), by(id year) // 1 if any car in that year
bysort id year: keep if _n==1 // only keep 1 record per person per year
 

mmmm

New Member
#7
Thanks again! The structure of the data is


Code:
id  everowned  "year 1.bought" "1.stillowns"  "reason 1.disposed"  "year 1. disposed"  "year 2.bought" "2.stillowns"  "reason 2.disposed" "year 2.disposed"
1   1          1975            0               3                   1998                1999             0             3                   2004
If a person hasn't purchased a second car, then there are simply no figures for that included. How could the code be adapted?
 

bukharin

RoboStataRaptor
#8
Are there only 2 cars max, or do you want to extend the technique to include any number of cars?

If there's only up to 2 cars, then the original method I sent is easily adapted as something like:
Code:
expand 62
bysort id: gen year=_n + 1949
gen hascar=year1>=year1bought & (year1<=year1disposed | stillhas1==1) if !missing(year1bought, year1disposed)
replace hascar=1 if year2>=year2bought & (year2<=year2disposed | stillhas2==1)
If you need to account for an unspecified number of cars then you should first use -reshape long- to make your dataset long, then use the code I posted on March 29.