Search code examples
datesasmacros

Editing SAS variables with a 2-part hierarchical suffix based on year and month


I am trying to edit variables in a sas dataset. The dataset has multiple types of variables which repeat for quite a few years, each month. The variables are character variables and have names like 'X_1999_01'. The basic idea is that I want to compare 1-2 dates (date1 and date2 below) that each observation might have with the implied dates represented by the variable name (basically the year and month in the name and assuming the first day of the month - 'refdate' below) and adjust the variable if one or more inequalities/equalities between the dates hold (see code). I've created a simple toy dataset below with this structure (including missingness) and show the macro code I have tried. I simplify the data to 2 years and 3 months, but, irl brute force is not an option. If anyone has ideas how to make the macro work or can suggest an alternative approach, it would be much appreciated.

data toydata ; 
    input var1999_01 $ var1999_02 $ var1999_03 $ var2000_01 $  
       var2000_02 $ var2000_03 $ date1 : date9. date2 : date9. ;
    format date1 date2 yymmddn8. ;
    datalines ;
    . . . no yes yes 01FEB2000 .
    yes yes yes no no no 01JUL1998 01JAN2000
;
run ;

/* to see the data */
proc print data=toydata ; run ;
proc contents data=toydata ; run ;

%macro failed ;
    data wantdata  ;
        set toydata ;
        %do i=1999 %to 2000 ;
            %let year = &i. ;
            array var&year. [3] $ var&year._01-var&year._03 ;
            %do j=1 %to 3 ;     

                    /*the macro variable refdate is defined based 
                    on the year (i loop) and month (j loop) in the 
                   variable name, assuming 1st day of the month. */

                   
                    %let refdate = %sysfunc(mdy(&j., 1, &year.)) ;

                    /* first condition is whether date1 (a date) is 
                     greater than (i.e., after) the reference date 
                     implied by the variable name, e.g., var1999_01 
                     implies 01Jan1999. */

                    %if date1 > &refdate. %then var&year.[&j.] = 'nye' ;

                    /* the second condition is whether date 2 (if 
                       not missing) is equal to implied date and 
                       then if it's less than (before) the implied 
                       date */

                    %if %length(date2) %then %do ;
                        %if date2 = &refdate. %then var&year.[&j.] = 'event this month' ;
                        %if date2 < &refdate. %then var&year.[&j.] = 'past event' ;
                        %end ;
                %end ;
            %end ;
        run ;
    %mend ;

%failed 

Solution

  • To dynamically access a particular variable from a list of variables in a data step you define an array and index into the array.

    Since the suffix on the variable names seems to indicate YEAR and MONTH just use INTCK() function to calculate which index to use into the array.

    I cannot tell what you are actually trying to do so let's just make something that captures the value of the EVENT variables (the ones with the month encoded into their names) based on month of the current DATE variable being checked.

    %let basedate='01JAN1999'd;
    data want;
      set have ;
      array events $3 var1999_01-var1999_12 var2000_01-var2000_12 ;
      array dates date1-date3 ;
      array event_this_month [3] $3 ;
      do index=1 to dim(dates);
        month = 1 + intck('month',&basedate,dates[index]);
        if 1<=month<=dim(events) then 
          event_this_month[index] = events[month]
        ;
      end;
    run;
    

    So if DATE1 is in the month DEC_1999 then the value of EVENT_THIS_MONTH1 will be the value of VAR1999_12.

    If instead you mean to use the two date variables to indicate a range of months then perhaps you want to run the index month over the set of values implied by those two dates?

    So to set all of the variables from DATE1 to DATE2 to the value 'yes' you could use a loop like this:

    %let basedate='01JAN1999'd;
    data want;
      set have ;
      array events $3 var1999_01-var1999_12 var2000_01-var2000_12 ;
      do month=max(1,intck('month',&basedate,date1))
           to  min(dim(events),intck('month',&basedate,date2))
      ;
        events[month] = 'yes';
      end;
    run;
    

    So in the comments you seem to say that you want to take the index based on the MIN of DATE1 and DATE2 and set the array entries AFTER that to 'nye'. Given your example data perhaps it is more likely you meant to start from the MAX or the two dates?

    Let's try it, first lets fix your example data to have contiguous month variables.

    data have ; 
      format date1 date2 yymmdd10. ;
      array events $3 var1999_10-var1999_12 var2000_01-var2000_03;
      input var1999_10--var2000_03 date1 :date. date2 :date.;
    datalines ;
    . . . no yes yes 01FEB2000 .
    yes yes yes no no no 01JUL1998 01JAN2000
    ;
    

    Now the base date for the array of date based variables is 01OCT1999

    %let basedate='01OCT1999'd;
    data want;
      set have ;
      array events $3 var1999_10-var1999_12 var2000_01-var2000_03;
      do month=max(1,1+intck('month',&basedate,max(date1,date2))) to dim(events);
        events[month] = 'nye';
      end;
      drop month;
    run;
    

    Results

    enter image description here