How does one calculate distributions from inprecise time span data of differing levels of precision?

#1
Hi everybody

I am new to the forum and apologize in advance if I have missed a similar thread when searching the forum.

I have a very specific data set, for which I am trying to find a way to calculate a distribution.
It consists of age-at-death estimations of archaeological skeletal remains. Due to the nature of the methods employed I don't have absolute ages, but age ranges within which an individual died. Those age ranges vary in percision depending on the age itself as well as the preservation of the remains.
My dataset looks something like this:
Individual Nr. Age
1 12-16
2 14-20
3 18-30
4 11-13
... and so on
My main problem is that I don't know and can't figure out what this sort of data is called.
It is sort of interval censored, but I only have one observation per individual and the event has always occured. So it would, I guess be intervall censored data with large parts of the dataset missing e.g. everyone who has not yet died. Can I still treat it als interval censored data in that case? Or if not, does anyone have a suggestion of how to manage this kind of data?
I usually do statistics in R, so if anyone has an idea of R packages able to cope with this kind of data, I would welcome the input. I have already used the survival package and the survfit funtion etc., but since I am not sure if I can treat it as interval censored data, I am not sure wheter the results are an accurate representation of my data.

I am very greatful for your help and every little hint that could lead me in the right direction.
 

katxt

Active Member
#2
I haven't come across this before, but many procedures have a weighted data version. In this case, possibly you could use the midpoints as the values and 1/range^2 as the weights. 1/range^2 is an attempt at 1/variance which is the common weighting.
Just a thought.
 

katxt

Active Member
#3
Another idea which at least sounds plausible. Depending what analysis you plan to do with the data, perhaps you can design a Monte Carlo test where each data point draws from a suitable distribution - for instance triangular over the range.
 
#4
Thanks for the responses.
I am not quite sure I understand you correctly. Both your ideas assume that the probability of death occuring is not equal for each time in the range, right? It probably isn't, however, we have no way of knowing at which point death is more or less likely (due to the limitations of the methods employed). I think Monte Carlo tests require a very large dataset, right? I usually have between 50 and 200 observations at most, so it might be to small a number for Monte Carlo.
 

katxt

Active Member
#5
we have no way of knowing at which point death is more or less likely (due to the limitations of the methods employed)
You're right. The first idea assumes that the true value could be anywhere in the range but is more likely to be near the middle of the range than near either end. (For most projects there are limitations on the accuracy of the data due to the methods employed. It's just that usually the uncertainty is hidden more than in your case.)
The Monte Carlo idea, as it is expressed, also assumes that the middle is more likely than the ends, but if you think that it is equally likely anywhere in the range you could use a uniform distribution.
Monte Carlo methods don't require large data sets. 50 observations would be ample. Whether or not MC would be workable depends on what you are hoping to do. You should be able to estimate means, SD's, standard errors and confidence intervals. You could do one and two sample tests of various sorts, regressions, correlations and anovas.