# Mortality table for water heaters

#### Outlier

##### TS Contributor

I would like to use this sample to answer the question: given that your water heater has lasted X years, you have a 50-50 chance of it lasting until Y years.
How do I correct this distribution for the small sample size so that I can make that prediction to some known confidence level?
Thank you.

Last edited:

#### nickleby

##### New Member
You could just take out the cases that are less than X, then calculate the median of that reduced sample. E.g., if X is 4 years, take out the two 2s, then calculate the median from 5 to 36 (equals 12). Thus you would arrive at "Given that your water heater has lasted for 4 years, you have a 50-50 chance of it lasting until 12 years." However, given your very samll sample size, your confidence in this would be small -- and would depend on the population size of water heaters as well as your desired margin of error.

#### Outlier

##### TS Contributor
You could just take out the cases that are less than X, then calculate the median of that reduced sample. E.g., if X is 4 years, take out the two 2s, then calculate the median from 5 to 36 (equals 12). Thus you would arrive at "Given that your water heater has lasted for 4 years, you have a 50-50 chance of it lasting until 12 years." However, given your very samll sample size, your confidence in this would be small -- and would depend on the population size of water heaters as well as your desired margin of error.
Yes, that's how I did it, so far. The problem is how do I find out how close the shape of this CDF equals that of the 110 M water heaters in the US.
I'd be happy with 50% confidence on predictions like that above.

For the replacement ages of residential HVAC equip. I have ~90 samples and I am much more confident in the shape of the CDF.

#### fed1

##### TS Contributor
Confidence intervals for time to event data and median survival times are usually calculated using something called "greenwoods formula". If you do not know what that is it is ok because nobody remembers.

What you need is some stat software to calculate this for you. What do you got? you can download some like R for free!!

#### Outlier

##### TS Contributor
Confidence intervals for time to event data and median survival times are usually calculated using something called "greenwoods formula". If you do not know what that is it is ok because nobody remembers.

What you need is some stat software to calculate this for you. What do you got? you can download some like R for free!!
http://en.wikipedia.org/wiki/Kaplan–Meier_estimator

I do my own stat functions using Excel's packaged ones or I write my own.

#### fed1

##### TS Contributor
http://www.r-project.org/

Here it is. I think it automatically comes with the survival package.

You can try to calculate confidence intervals by hand. If you use the variance formula on the wiki page and treat the estimate of the survival function at each time as a normal variate with that variance you will get intervals. The problem is they wont be bounded by (0, 1). There is some trickery involved in getting bounded confidence intervals. I am pretty sure R will take care of this.

Good luck.

#### Outlier

##### TS Contributor
What I found out is that replacement age <= normal wearout age [as evidenced by an increasing failure rate of HVAC components] due to psych reasons. 20 years is a common replacement age for resi. HVAC equip.

#### fed1

##### TS Contributor
I like to replace my water heater before it fails. Im sure most people do.

#### Outlier

##### TS Contributor
I like to replace my water heater before it fails. Im sure most people do.
If it is likely to fail gracefully you can use a pan underneath and a \$16 water level detector, to get the last minute of life out of it.
But you never know if it will fail catastrophically. YouTube has some videos of WHs with the safety measures bypassed; they get some altitude.
It depends on your 'utility function' for risk vs. money vs inconvenience.