fill missing variables by cycle

#1
Hi all,

I am on this for some time now and don't know how to solve it.
It is actually only a small part of a much bigger task, but I am stuck and cannot go further.

I have quite a large dataset (ds) with around 40 variables.
To every ID there are multiple cycles. In every cycle there are multiple parameters with date and day (derived from date).
I have extracted the day for each group of parameters (in my example H* and F*) and written in separate variables [h_day and f_day].
Now it gets tricky (at least for me):
Next step is to fill the missings of both variables per cycle.

Attached is a snapshot of some mock-up data. There you can see (in a very minimal way) the initial dataset I am using, what I get using retain and how it should look in the end.
Also not ideal is the sorting and merging. If I use id cycle or id cycle parameter that is not unique. However I can get rid of duplicates later on.

Any help is appreciated as I am just starting to code.

Thanks in advance,
Gingerfish
 
#3
Hi Dason,

that is exactly the problem. I don't know how to fill it.

In the picture attached you can see the wrong dataset I got using a retain statement.
data xy (drop=_hd _fd);
merge x y;
by id cycle;
retain _hd _fd;
if not missing(hema_day) then _hemsd = hema_day ;
else hema_day = _hemsd ;
run;

problem with that is, retain fills each missing observation with the value of the observation above and it does not care about the cycle.
I tried something like
do until(last.cycle)
but it did not work either.

Is there something like "for each cycle do"? I guess that would help.

Thanks,
gingerfish
 

Dason

Ambassador to the humans
#4
Hi Dason,

that is exactly the problem. I don't know how to fill it.
No. I don't understand why you want the thing that you say SAS is filling in wrong to be 23. I don't understand what your scheme is for filling in the data at all. Can you elaborate a little more on why that spot should be 23?
 
#5
Ah I see.

I guess you are talking about colums h_day.
It should be 23 because it appears in cycle B. 14 is the value which is valid in cycle A.

Both, h_day and f_day, should be filled by cycle. So whenever a new cycle starts, the missings should be filled with what is already available in that cycle. Not with something from the cycle before (or after).

It gets complicated as parameters in one cycle can have different dates (and days). Sometimes H1 and H2 occure before, at the same date or after the date of Fx.
At least Hx parameters are always on the same date as well as Fx parameters are.

I hope I could clear things up.
Thank you very much for lookning into this!
gingerfish
 
#6
Hi all,

I solved it! :-D

what I did is the following:
Code:
 proc sort data=x1;
   by id cycle h_day;
run;
 
data x2 (drop = _fd);
   set x1;
   retain _fd;
   if not missing(f_day) then  _fd = f_day ;
   else f_day = _fd ;
run;
proc sort data = x2;
   by id cycle descending h_day;
run;
 
data x3 (drop = _hd);
   set x2;
   retain _hd;
   if not missing(h_day) then  _hd = h_day ;
   else h_day = _hd ;
run;
proc sort data = x3;
   by proto crtn pt lbdt lbcpe lbparm;
run;
So I had to do two steps in stead of one, because I needed two kinds of sorting to fill the missings correctly per cycle.

I am sorry I did not explain it in detail in the first place. I was looking at that problem for hours and didn't see anything anymore ...

Thanks very much,
Tim