Predicted probabilities from logistic regression


I have carried out logistic regression in STATA to adjust for several patient characteristic variables that are thought to affect my outcome. I used the predict function to get estimates of the predicted probability of a positive outcome for each patient. Ultimately I would like to get monthly data to plot in a time series...before I carried out my regression I simply had raw numbers for my outcome by month, so I would like something similar.

I have two issues:

1. each patient has spent a proportion of their hospital stay in either 1 or several of 10 wards. Is it appropriate to multiply predicted probabilities by the proportion of time spent in a ward, would this give me the probability of a positive outcome on each ward?

2. to get data into a monthly time series would the sum of predicted probabilities for each month, or the mean of predicted probabilities for each month be more appropriate?

I realize that although I have posted in regression, it is actually maybe more of a probability problem.

Any advice would be greatly appreciated.

Last edited:
you can control for ward if you have enough data to support testing that number of parameters
Thanks for your reply.

Do you mean control for ward by adding the 10 variables for proportion of time spent on each ward into the logistic model? My data has columns for ward1....ward157, indicating all the possible ward changes that a patient could have had! I am really only interested in whether they have been on 10 wards. I have over 63,000 observations.
shoving lots of parameters into a canned software for logistic regression (or other regression) even when there's a lot of data has gotten people in trouble before

start with simple descriptions of the data, whatever descriptions are most relevant to your study questions, e.g. proportion of positive outcome in each of the 10 wards of interest; also try making some plots/graphs of the data; write out the full model by hand on paper and understand precisely what each parameter means, exactly as it is represented by your software of choice; in short, get a better understanding of the situation before jumping into a black box.
Yeah, the reason i'm dubious about adding ward into the model is that I don't know the exact point (time or ward) at which the patient acquired the outcome. Only that they got the outcome at some point during their stay, and then I have a list of wards (and times) they have been in.

The other variables in my model are all patient characteristics that are potential (published) factors that effect whether a patient acquires the outcome. So the info I have on ward doesn't seem to fit with the rest of my independent variables.
Last edited: