Search code examples
if-statementsaswhere-clausetransposeretain

SAS Create New Variables


I would like to create a new variable 'Weight' and 'Height' using the variables A through D:

DATA: 
A       B     C    D       
Jim  Weight  180   Screen
Jim  Weight  200   C1
Jim  Height  60    Screen
Jim  Height  61    C3
Tod  Weight  190   Screen
Tod  Weight  201   C1
Tod  Height  70    Screen
Tod  Height        C1

The Weight variable would have the following criteria: Set Weight to Column C if Column B = Weight, Column D = C1 and Column C is not missing. Else if Column D is not C1 or Column C is missing, then use the Column C where Column D is Screen. So in plain terms, let's say Jim was weighed during screening and not for C1 then I would like to keep his screen weight. Or if he was screened at C1 but the wight is missing, I would like to keep his screen weight. Similarly for the Height variable.

My code which is incorrect is:

DATA MYTEST; 
    SET TEST.TEST; 
    if B = 'WEIGHT' and D = 'C1D1' and not missing(C) then NEW = C;
    else if (missing(C) or D ~= 'C1') and B = 'WEIGHT' then WEIGHT = C where D = 'Screen';
    if B = 'HEIGHT' and D = 'C1D1' and not missing(C) then NEW = C;
    else if (missing(C) or D ~= 'C1') and B = 'HEIGHT' then WEIGHT = C where D = 'Screen';
    else WEIGHT = 'NA';
 RUN; 
 PROC PRINT DATA = MYTEST; 
 RUN; 

Desired Outcome:

DATA: 
A    Weight   Height 
Jim   200       60
Tod   201       70

Solution

  • The result data can be created using an update statement, applied to the data after transposition. UPDATE is different than MERGE in this way -- missing values in the update data set will never overwrite an existing value in PDV.

    DATA have;
    input 
    a $  b $     c     d $; datalines;
    Jim  Weight  180   Screen
    Jim  Weight  200   C1
    Jim  Height  60    Screen
    Jim  Height  61    C3
    Tod  Weight  190   Screen
    Tod  Weight  201   C1
    Tod  Height  70    Screen
    Tod  Height  .     C1
    run;
    
    proc transpose data=have out=haveT;
      by a d notsorted;
      var c;
      id b;
    run;
    
    data haveScreen / view=haveScreen;
      set haveT;
      where d='Screen';
      by a;
      if first.a;
    run;
    
    data want;
      update
        haveScreen
        haveT (where=(d in ('Screen', 'C1')))
      ;
      by a;
    run;
    

    Your posted code is not using WHERE correctly. Where clauses are not conditionally applied and also can not be changed dynamically while the data step is running. A where clause is applied at 'run initialization` time. The where clause is a non-executable/unconditional statement and the last occurring one in data step code is the one that will be applied when the step runs.

    For example, in the following, the if 0 is never true yet the where is applied regardless.

    options msglevel=i;
    data _null_;
      set sashelp.class;
      if name =: 'X' then where age > 12;
      if 0 then where age > 1;
    run;
    ----- LOG -----
    4625  options msglevel=i;
    4626  data _null_;
    4627    set sashelp.class;
    4628    if name =: 'X' then where age > 12;
    4629    if 0 then where age > 1;
    NOTE: WHERE clause has been replaced.
    4630    run;
    
    NOTE: There were 19 observations read from the data set SASHELP.CLASS.
          WHERE age>1;