If statement braces not producing an output

Hello and thanks for your help,

I am fairly new to Stata and am having issues trying to get the following if statement to run. When I run the "if statement" from my "do file" it simply shows the code without any changes occurring.
I found the only way it will run is if I exclude the following: "(end_date - start_date[_n+1]) >= -1)

Here is the code:

gen HEPLOS=.
gen Stopper=0
if ((((end_date - start_date[_n+1]) >= -1) & (ID== ID[_n+1])) & Stopper!= 1) {
replace HEPLOS= (end_date[_n+1] - start_date)
replace Stopper=1
replace Stopper[_n+1]=1

Is it coded incorrectly? I don't understand why I don't get any output even when I put trace on.

Essentially what the code is suppose to be doing is finding the total length of stay for an "episode" which accounts for admissions on the same day or one day later. I have several of these statements and they only get larger (but are essentially identical) when I count if they are admitted 2, 3, 4 times consecutively. I created a "Stopper" which stops the code from proceeding if it has been accounted for by those other larger re-admissions codes (double counting).

Thanks again for any help!


You're getting mixed up between the "if" qualifier and the "if" command. See -help if- and -help ifcmd-. I don't really understand your data structure so can't offer any specific advice on how to fix the problem. If you post a sample of your data along with how you'd like it to look that would make it a lot easier.
Thanks for the response and any help is appreciated!

My data resembles this:
(Note: each column represents an admission and discharge from the hospital)

View attachment 4855
View attachment 4856

So just to clarify my goal:
• The variable “HEPLOS”= Hospital episode length of stay
o The hospital episode indicates those who are re-admitted on the same day or a day later would count as never truly leaving the hospital and the length of stay continues from the original start date until they are discharged and do not come back after 2 days
• So HEPLOS needs to account for how many times the patient was re-admitted (could range from not being re-admitted to being re-admitted 5x [remember the definition of within 0-1 days])
• I would like to code it as you currently see it in the "data example", where the HEPLOS is given only on the first admission of a “hospital episode”.
(the column re-admissions is just for clarification)

I hope this makes sense, or at least my previous code will be somewhat understandable given this information.

I'm new to this so thanks for the patience, any suggestions or comments are appreciated!


This is how I would do it:
bysort id (start_date end_date): gen episode=sum(start_date - end_date[_n-1]>1)
egen heplos=total(los), by(id episode)
This will give you heplos for every row in the episode. If for some reason you only want it for the first admission (I'm not sure why you'd want this) you could remove the extra ones using:
bysort id episode (start_date end_date): replace heplos=. if _n>1
See -help by-, -help sum()- and -help egen-. It is well worthwhile to learn how to use -by-, which is one of Stata's most powerful features despite looking deceptively simple. There's a good section in the User's Guide, and you may also want to check out this tutorial:
Speaking Stata: How to move step by: step
Thank-you for your help, I will look into the by command and become more familiar with it.

If you have a second do you mind explaining this section of your code: gen episode=sum(start_date - end_date[n-1] >1).
I really do not understand the use of sum and how it works in this context. I can only understand it generally, where all those reoccurring hospitalizations have the same episode which allows us to calculate the HEPLOS through the LOS variable.


sum() creates a running sum from one observation to the next. In this case, it's creating a running sum of the expression "start_date - end_date[_n-1]>1". This expression compares the start date of the current observation with the end date of the last observation (within each patient, since it's running -by- id). If the start date is >1 day later than the previous end date (ie a new episode) this expression is true so returns 1, otherwise it's false so returns 0. Since it's true only if it's a new episode, the variable created by sum() increases by 1 with each new episode.