Search code examples
saslagsas-macro

Calculate new variable based on existing variables using SAS


I have been trying to calculate a new field based on existing fields using SAS. I tried lag function but didn't work. Below is the data and program I have tried.

Raw Data:
Cust_ID Prdct_No no_of_mnths Actual_Shp_Dt Count
x 12 2 8/1/2014 10
x 12 1 8/11/2014 10
x 12 0 8/23/2014 10
y 13 2 8/1/2014 10
y 13 1 8/11/2014 10
y 14a 2 8/1/2014 10
y 14a 1 8/11/2014 10
y 14a 0 8/21/2014 10
z 15 3 9/1/2014 20
z 15 2 9/15/2014 20

Required results:
Target_Ship_Dt is sum of (lag(Actual_Shp_Dt) and lag(Count)) grouped by Cust_ID and Prdct_No. If Actual_Shp_Dt is late or earlier than target_shp_Dt, then Target_Shp_Dt is SUM(lag(Target_Shp_Dt) and lag(Count)).

Cust_ID Prdct_No no_of_mnths Actual_Shp_Dt Count Target_Ship_Dt
x 12 2 8/1/2014 10
x 12 1 8/11/2014 10 8/11/2014
x 12 0 8/23/2014 10 8/21/2014
y 13 2 8/1/2014 10
y 13 1 8/11/2014 10 8/11/2014
y 14a 2 8/1/2014 10
y 14a 1 8/11/2014 10 8/11/2014
y 14a 0 8/21/2014 10 8/21/2014
z 15 3 9/1/2014 20
z 15 2 9/15/2014 20 9/21/2014

Code I have tried:
Data Raw_Data(drop = Prdct_ID_lag Actual_Shp_Dt_lag Count_Lag Target_Shp_Dt1);
Set Raw_Data;
By Cust_ID Prdct_ID;
Prdct_ID_lag = lag(Prdct_ID);
Format Actual_Shp_Dt_lag date9.;
Actual_Shp_Dt_lag = lag(Actual_Shp_Dt);
Count_Lag = lag(Count);
Format Target_Shp_dt date9.;
If Prdct_ID ~= Prdct_ID_lag then Actual_Shp_Dt_lag = .;
If Prdct_ID ~= Prdct_ID_lag then Count_Lag = .;
Target_Shp_Dt1 = Actual_Shp_Dt_lag + Count_Lag;
Target_Shp_Dt = Target_Shp_Dt1;
If Actual_Shp_Dt > Target_Shp_Dt1 Then Target_Shp_Dt = Target_Shp_Dt1 + Count_lag;
If Actual_Shp_Dt < Target_Shp_Dt1 Then Target_Shp_Dt = Target_Shp_Dt1 + Count_lag;
Run;

But this code didnt give me right results.

Any help would be appreciated.

After trying new code: Data have:

Cust_ID Prdct_No no_of_mnths Actual_Shp_Dt Count Target_Ship_Dt
x 12 2 8/1/2014 10
x 12 1 8/11/2014 10 8/11/2014
x 12 0 8/23/2014 10
y 13 2 8/1/2014 10
y 13 1 8/11/2014 10 8/11/2014
y 14a 2 8/1/2014 10
y 14a 1 8/11/2014 10 8/11/2014
y 14a 0 8/21/2014 10 8/21/2014
z 15 3 9/1/2014 20
z 15 2 9/15/2014 20

Code that worked:
Data Raw_Data(drop = Prdct_ID_lag Actual_Shp_Dt_lag Count_Lag Target_Shp_Dt1);
Set Raw_Data;
By Cust_ID Prdct_ID;
Cust_ID_lag = lag(Cust_ID);
Prdct_ID_lag = lag(Prdct_ID);
Format Actual_Shp_Dt_lag Target_Shp_Dt date9.;
Actual_Shp_Dt_lag = lag(Actual_Shp_Dt);
Count_Lag = lag(Count);
retain target_shp_dt_lag;
If Cust_ID = Cust_ID_lag and Prdct_ID = Prdct_ID_lag and target_shp_dt_lag = .
Then Target_Shp_Dt = Actual_Shp_Dt_lag + Count_Lag ;
ELSE IF Cust_ID = Cust_ID_lag and Prdct_ID = Prdct_ID_lag
Then Target_Shp_Dt = target_shp_dt_lag + Count_Lag ;
ELSE Target_Shp_Dt = .;
target_shp_dt_lag = Target_Shp_Dt;
Run;


Solution

  • Your definition says that you need to use lag(target_shp_dt) if the initial target_ship_dt isn't equal to actual_shp_dt, but your code never actually does this.

    Update:

    It seems that lag doesn't work for variables created in the same data step, but using retain does, which is why Target_Shp_Dt_lag intially had all missing values.

    Also, are you sure about the very last row in your example dataset? It seems to be inconsistent with your rules: lag(count) + lag(actual_shp_dt) = 9/21/2014, but 9/21/2014 > 9/15/2014, so that would mean you'd need to take lag(count) + lag(target_shp_dt), which results in a missing value as lag(target_shp_date) is missing. The code below works for all the other cases.

    data raw_data;
    input Cust_ID :$1. Prdct_No :$3. no_of_mnths :3. Actual_Shp_Dt :mmddyy10. Count :3.;
    format actual_shp_dt yymmdd10.;
    cards;
    x 12 2 8/1/2014 10 
    x 12 1 8/11/2014 10 
    x 12 0 8/23/2014 10 
    y 13 2 8/1/2014 10 
    y 13 1 8/11/2014 10 
    y 14a 2 8/1/2014 10 
    y 14a 1 8/11/2014 10 
    y 14a 0 8/21/2014 10 
    z 15 3 9/1/2014 20 
    z 15 2 9/15/2014 20 
    ;
    run;
    
    data want;
    input Cust_ID :$1. Prdct_No :$3. no_of_mnths :3. Actual_Shp_Dt :mmddyy10. Count :3. Target_ship_dt :mmddyy10.;
    format actual_shp_dt Target_ship_dt yymmdd10.;
    infile cards missover;
    cards;
    x 12 2 8/1/2014 10 
    x 12 1 8/11/2014 10 8/11/2014 
    x 12 0 8/23/2014 10 8/21/2014 
    y 13 2 8/1/2014 10 
    y 13 1 8/11/2014 10 8/11/2014 
    y 14a 2 8/1/2014 10 
    y 14a 1 8/11/2014 10 8/11/2014 
    y 14a 0 8/21/2014 10 8/21/2014 
    z 15 3 9/1/2014 20
    z 15 2 9/15/2014 20 9/21/2014
    ;
    run;
    
    Data have(drop = Prdct_no_lag Actual_Shp_Dt_lag Count_Lag Target_Shp_Dt1 Target_Shp_Dt_lag) ; 
    Set raw_data; 
    By Cust_ID Prdct_no; 
    Prdct_no_lag = lag(Prdct_no); 
    Actual_Shp_Dt_lag = lag(Actual_Shp_Dt); 
    Count_Lag = lag(Count); 
    retain target_shp_dt_lag;
    if first.prdct_no then call missing(Actual_Shp_Dt_lag, count_lag, prdct_no_lag, target_shp_dt_lag);
    Target_Shp_Dt1 = Actual_Shp_Dt_lag + Count_Lag; 
    Target_Shp_Dt = Target_Shp_Dt1; 
    If Actual_Shp_Dt ne Target_Shp_Dt1 Then Target_Shp_Dt = Target_Shp_Dt_lag + Count_lag; 
    Target_Shp_Dt_lag = Target_Shp_Dt; 
    format Actual_Shp_Dt_lag Target_Shp_dt: yymmdd10.; 
    Run; 
    
    proc compare base = want compare = have;
      var Target_Shp_Dt;
    run;