Probability of Breakdown

#1
I am considering using regression (either Cox or Logistic) analysis to estimate the probability of breakdown of vehicles in a fleet.

I have some evidence showing that likelihood of breakdown is influenced by the following variables:

sex (of owner)
age (of owner)
age (of vehicle)

amongst others.

Regression can be used to estimate the probability of breakdown at a given time based on the some known assumptions about the vehicle owner.

I have hit a stumbling block in the following situation:

when a previous owner (say a 53yo male) gives the vehicle to a new owner (say a 20yo female).

Intuitively I feel that the usage of the previous owner will affect the state of the vehicle and hence the probability of the new owner breaking down will be altered but I am unsure how to describe this mathematically.

I had considered averaging:

i.e. if male is represented '1' and female '0' then in this situation VAR is '0.5'
but this is just guesswork.

Any help would be appreciated.
 
#2
I was thinking about this more today and there are instances in my dataset where a vehicle may have been owned by as many as three previous owners during the sampling time:

say for instance:

User one is male 20yo and owned vehicle for 12 months
User two is male 34yo and owned the vehicle for 6 months
User three is female 60yo and owned vehicle for 2 months

If I was to give values 2 and 1 to male and female respectiveley. I could then use a weighted average to find the current value (between 1 and 2) at time X months, the same could be done for user age, while taking account of all previous owners and also the length of time they owned vehicle. The 'sex' values are no longer nominal but ordinal but for these regressions this is ok?

I feel I should try it and see how it goes but I wanted to ask incase I'm unwittingly making a mockery of the regression method (I am inexperienced user).
 

Mean Joe

TS Contributor
#3
I wouldn't take the average value of sex (for records where car has >1 owner). A couple of reasons in general: the (weighted) average is a kind of weak measure to use, better for you to keep all the data than to replace several records with one record; sex=1.5 doesn't really have any meaning in the real world.

One suggestion for you:
Make a variable for age of vehicle at time of "purchase"
You can make several 0/1 variables to denote the car had a previous owner: prev_male=1 if previous owner of car was male, prev_3040male=1 if previous owner of car was male 30-40 years old

Another thing:
I think Cox regression has a way to handle covariates that change over time (eg you're finding records where the sex of the owner changes over time). I'm not too familiar with this, but it probably is a better way to handle your regression.