I need help calculating the number of days between startdate and enddate without accounting for the overlapping days. My data is as followed:
ID STARTDATE ENDATE N_DAYS OVERLAP
1 01-Jul-2000 07-Jul-2000 6.00 .00
1 01-Jul-2000 01-Dec-2000 153.00 1.00
1 30-Sep-2000 13-Oct-2000 13.00 1.00
2 25-May-1998 05-Jun-1998 11.00 .00
3 29-Apr-2000 12-May-2000 13.00 .00
3 27-May-2000 09-Jun-2000 13.00 .00
3 01-Jul-2000 01-Dec-2000 153.00 .00
3 02-Sep-2000 15-Sep-2000 13.00 1.00
I can calculate the number of days between the start date and the end date.
COMPUTE N_DAYS = DATEDIFF(ENDATE,STARTDATE,"day").
EXECUTE.
However, there are some overlapping dates for the same ID. For example, for ID 1, there is a total overlap of 19 days. The total number of days for ID 1 should be 153 days.
I can flag the overlap with the following syntax.
SORT CASES BY ID.
COMPUTE OVERLAP=0.
IF ID=LAG (ID) AND STARTDATE < LAG(ENDDATE) OVERLAP=1.
EXECUTE.
Then I’m not sure whether I should create another variable STARTDATE1 AND ENDDATE1 to flag the earliest start date and the latest end date by ID, then recalculate the date differences. If yes, how do I create such syntax? I have started the following syntax, however it picks up all the dates. For ID 1, I want the syntax to only pick up 01-Jul-2000 but it also picks up 30-Sep-2000.
If ID=LAG (ID) AND OVERLAP=1 STARTDATE1=MIN(STARTDATE).
EXECUTE.
Or is there another way to calculate the days difference without accounting for the overlapping days for each ID?
Thank you.
ID STARTDATE ENDATE N_DAYS OVERLAP
1 01-Jul-2000 07-Jul-2000 6.00 .00
1 01-Jul-2000 01-Dec-2000 153.00 1.00
1 30-Sep-2000 13-Oct-2000 13.00 1.00
2 25-May-1998 05-Jun-1998 11.00 .00
3 29-Apr-2000 12-May-2000 13.00 .00
3 27-May-2000 09-Jun-2000 13.00 .00
3 01-Jul-2000 01-Dec-2000 153.00 .00
3 02-Sep-2000 15-Sep-2000 13.00 1.00
I can calculate the number of days between the start date and the end date.
COMPUTE N_DAYS = DATEDIFF(ENDATE,STARTDATE,"day").
EXECUTE.
However, there are some overlapping dates for the same ID. For example, for ID 1, there is a total overlap of 19 days. The total number of days for ID 1 should be 153 days.
I can flag the overlap with the following syntax.
SORT CASES BY ID.
COMPUTE OVERLAP=0.
IF ID=LAG (ID) AND STARTDATE < LAG(ENDDATE) OVERLAP=1.
EXECUTE.
Then I’m not sure whether I should create another variable STARTDATE1 AND ENDDATE1 to flag the earliest start date and the latest end date by ID, then recalculate the date differences. If yes, how do I create such syntax? I have started the following syntax, however it picks up all the dates. For ID 1, I want the syntax to only pick up 01-Jul-2000 but it also picks up 30-Sep-2000.
If ID=LAG (ID) AND OVERLAP=1 STARTDATE1=MIN(STARTDATE).
EXECUTE.
Or is there another way to calculate the days difference without accounting for the overlapping days for each ID?
Thank you.