Search code examples
datesasretain

SAS: How do I delete records with a date within 30 days of the previous record, recursively?


Say I have a dataset of laboratory records for a set of patients (ordered by ID and date from earliest to latest) with this format, I want to remove all the bolded labs and keep all the non-bolded ones.

Basically, I want to only keep labs for a specific patient (labs from different patients should not be compared against each other) that are done >30 days from the previous lab, while not counting the removed labs when evaluating the next lab (i.e. if a lab is marked in red I want to ignore it when evaluating if the next lab is within 30 days and should thus be removed).

You can see for patient 1111, their 3rd lab is within 30 days of their 2nd lab, but because the 2nd lab is already marked for removal the 2nd lab does not count and thus the 3rd lab should be kept.

Does any one have any advice or suggestions for how this could be accomplished in SAS?

ID Lab_Date

1111 Jan 1 2023

1111 Jan 15 2023

1111 Feb 3 2023

1111 Feb 16 2023

2222 Jan 2 2023

2222 Jan 20 2023

2222 Feb 8 2023

2222 Feb 10 2023

2222 Feb 12 2023

3333 Jan 15 2023

3333 Feb 5 2023

3333 Feb 18 2023

I've tried retaining/lagging the lab date from the previous record and then comparing it to the current record, but this ends up removing records that shouldn't be removed.


Solution

  • You can use retain to do this:

    data have;
      input id$ lab_date anydtdte11.;
      format lab_date yymmdd10.;
      cards;
    1111 Jan 1 2023
    1111 Jan 15 2023
    1111 Feb 3 2023
    1111 Feb 16 2023
    2222 Jan 2 2023
    2222 Jan 20 2023
    2222 Feb 8 2023
    2222 Feb 10 2023
    2222 Feb 12 2023
    3333 Jan 15 2023
    3333 Feb 5 2023
    3333 Feb 18 2023
    ;
    run;
    
    data want;
      set have;
      by id;
    
      retain lab_date_pre;
      if first.id then lab_date_pre=lab_date;
      else if lab_date-lab_date_pre<=30 then delete;
      else lab_date_pre=lab_date;
    run;
    

    When first record, let lab_date be the benchmark date, named lab_date_pre.
    In the following records: If lab_date is within 30 days from the benchmark date, delete it. Else if lab_date is more than 30 days from the benchmark date, let lab_date be the new benchmark date.