I have two data sets :
I combine the two as follows:
data combined;
set mothers all_women;
Now as the mothers are already in the datset all_women, I want to delete the repeated entries - with the condition that I keep the observations where "MOTHER"=1.
I tried the following:
proc sort data=combined; by ID DESCENDING MOTHER; run;
proc sort data=combined nodupkeys; by ID; run;
yet I lose some of the mothers because I am left with only 5458 observations where "MOTHER"=1. What have I done to introduce this error?
Instead of using NODUPKEY, use FIRST./LAST. processing.
proc sort data=combined;
by ID DESCENDING MOTHER;
run;
data want;
set combined;
by ID descending mother;
if not (first.ID) and (mother=0) then delete;
run;
That would keep any ID that had mother=0 only (keep 1 record per), and would keep all mother=1 rows.