Survival analysis advice


New Member

I'm having trouble wrapping my head around whether my data is censored or not, and interpreting the results of the (possibly inappropriate) model which I have selected to perform a regression.

I have data recording incubation survival rates for 4 different penguin colonies, in two different locations (islandn, islandg) over two different years (2012, 2013). Sample sizes vary, as do the size of colonies.

Question 1:
So the data is for the entire duration of the incubation periods (about 34 days in all cases), and therefore the penguin eggs all either hatched or "dead" (i.e., was abandoned) and will never hatch. From what I understand, this means that the data is not censored, is that right?

Question 2:
I have another set of data for another colony (colony size 1000 nests, year 2012, islandg), where the researchers only started recording what was going on 14 days before hatching - does this mean it's left censored?

Question 3:
What I would like to do is find a model which will analyse my complete data, work out a formula for survival (given island, year and colony size), which I can then use to try to predict what the left-censored dataset should be. What I want is to say to the model: I have 60 chicks hatching out of 85 during the last 14 days of incubation, so how many nests should I have marked to start the laying period?

So to do this I think I need to use the survreg function in R, or possibly the cox proportional hazards model, but I think because my data is not censored I know the distribution so I should use the accelerated failure time model survreg will give me. Please correct me if I'm wrong. But when using survreg I am not sure, when I create my survival object, what the event actually is - in the help file it says:

The status indicator, normally 0=alive, 1=dead. Other choices are TRUE/FALSE (TRUE = death) or 1/2 (2=death). For interval censored data, the status indicator is 0=right censored, 1=event at time, 2=left censored, 3=interval censored. Although unusual, the event indicator can be omitted, in which case all subjects are assumed to have an event.

But what is my event? I want to know how many (and when) chicks die during the incubation period, so I am using 1 (dead) and 0 (hatched) which I have for every row of my data. Does this sound right, or should i be doing 1 (hatched) and 0 (dead)?

Question 4:
And then for the survreg function also tries to work out how your data is censored, would mine be "right", "left", "counting", "interval", "interval2" or "mstate" I am SO confused about this, I have read a lot and really tried to understand but my brain has given up so I would really appreciate any help.

Question 5:
The other thing I would like help with is interpreting a the results of a weibull survreg model. I just tried it using type="right" and 1(dead) / 0 (hatched) and I used survivalobject ~ year + island + colonysize as the formula. I think that this means work out how much of a difference year makes to the survival curve VS how much of a difference island makes to the survival curve VS how much of a difference colonysize (the only continuous variable) makes to the survival curve. So I was expecting to see something like this (with the blanks filled in):

So I could see the effect of each of the factor variables (islandn, islandg, year2013, year2012) as well as the continuous variable (presumably the "value" (marked x) would then tell me how much the intercept changes for every nest in a colony - perhaps i should divide my colony sizes by 100 or something to make this more meaningful). Instead of seeing this, though, I get values, std errors, zs and ps for Intercept, year2013, islandn, colonysize and Log(scale). Why is this?

Even if it's not possible for me to do the regression I described at the beginning of Question 3 I still have to analyse my data, so I really need to understand the model output.

Question 6:
I read somewhere that in order to translate the Value(s) you get from survreg into meaningful probabilities, you need to convert them somehow. I think this depends on the distribution, and I'm sure it's really simple but I haven't seen it in any examples and when I google it I can't find anything. How do I do it?

Question 7:
I picked the Weibull distribution because it seems to be the most flexible for survreg models, but is there any way of testing how well my model actually fit? Presumably by looking at the residuals, or would I do something else?


I really really appreciate any help with any of these questions. I am sorry for asking so much but I feel like I am getting nowhere on my own and I am a complete stats newbie. I feel like most of these questions are more general rather than R-syntax questions, so I hope I am posting in the right forum.