I have been trying to calculate a new field based on existing fields using SAS. I tried lag function but didn't work. Below is the data and program I have tried.
Raw Data:
Cust_ID Prdct_No no_of_mnths Actual_Shp_Dt Count
x 12 2 8/1/2014 10
x 12 1 8/11/2014 10
x 12 0 8/23/2014 10
y 13 2 8/1/2014 10
y 13 1 8/11/2014 10
y 14a 2 8/1/2014 10
y 14a 1 8/11/2014 10
y 14a 0 8/21/2014 10
z 15 3 9/1/2014 20
z 15 2 9/15/2014 20
Required results:
Target_Ship_Dt is sum of (lag(Actual_Shp_Dt) and lag(Count)) grouped by Cust_ID and Prdct_No. If Actual_Shp_Dt is late or earlier than target_shp_Dt, then Target_Shp_Dt is SUM(lag(Target_Shp_Dt) and lag(Count)).
Cust_ID Prdct_No no_of_mnths Actual_Shp_Dt Count Target_Ship_Dt
x 12 2 8/1/2014 10
x 12 1 8/11/2014 10 8/11/2014
x 12 0 8/23/2014 10 8/21/2014
y 13 2 8/1/2014 10
y 13 1 8/11/2014 10 8/11/2014
y 14a 2 8/1/2014 10
y 14a 1 8/11/2014 10 8/11/2014
y 14a 0 8/21/2014 10 8/21/2014
z 15 3 9/1/2014 20
z 15 2 9/15/2014 20 9/21/2014
Code I have tried:
Data Raw_Data(drop = Prdct_ID_lag Actual_Shp_Dt_lag Count_Lag Target_Shp_Dt1);
Set Raw_Data;
By Cust_ID Prdct_ID;
Prdct_ID_lag = lag(Prdct_ID);
Format Actual_Shp_Dt_lag date9.;
Actual_Shp_Dt_lag = lag(Actual_Shp_Dt);
Count_Lag = lag(Count);
Format Target_Shp_dt date9.;
If Prdct_ID ~= Prdct_ID_lag then Actual_Shp_Dt_lag = .;
If Prdct_ID ~= Prdct_ID_lag then Count_Lag = .;
Target_Shp_Dt1 = Actual_Shp_Dt_lag + Count_Lag;
Target_Shp_Dt = Target_Shp_Dt1;
If Actual_Shp_Dt > Target_Shp_Dt1 Then Target_Shp_Dt = Target_Shp_Dt1 + Count_lag;
If Actual_Shp_Dt < Target_Shp_Dt1 Then Target_Shp_Dt = Target_Shp_Dt1 + Count_lag;
Run;
But this code didnt give me right results.
Any help would be appreciated.
After trying new code: Data have:
Cust_ID Prdct_No no_of_mnths Actual_Shp_Dt Count Target_Ship_Dt
x 12 2 8/1/2014 10
x 12 1 8/11/2014 10 8/11/2014
x 12 0 8/23/2014 10
y 13 2 8/1/2014 10
y 13 1 8/11/2014 10 8/11/2014
y 14a 2 8/1/2014 10
y 14a 1 8/11/2014 10 8/11/2014
y 14a 0 8/21/2014 10 8/21/2014
z 15 3 9/1/2014 20
z 15 2 9/15/2014 20
Code that worked:
Data Raw_Data(drop = Prdct_ID_lag Actual_Shp_Dt_lag Count_Lag Target_Shp_Dt1);
Set Raw_Data;
By Cust_ID Prdct_ID;
Cust_ID_lag = lag(Cust_ID);
Prdct_ID_lag = lag(Prdct_ID);
Format Actual_Shp_Dt_lag Target_Shp_Dt date9.;
Actual_Shp_Dt_lag = lag(Actual_Shp_Dt);
Count_Lag = lag(Count);
retain target_shp_dt_lag;
If Cust_ID = Cust_ID_lag and Prdct_ID = Prdct_ID_lag and target_shp_dt_lag = .
Then Target_Shp_Dt = Actual_Shp_Dt_lag + Count_Lag ;
ELSE IF Cust_ID = Cust_ID_lag and Prdct_ID = Prdct_ID_lag
Then Target_Shp_Dt = target_shp_dt_lag + Count_Lag ;
ELSE Target_Shp_Dt = .;
target_shp_dt_lag = Target_Shp_Dt;
Run;
Your definition says that you need to use lag(target_shp_dt) if the initial target_ship_dt isn't equal to actual_shp_dt, but your code never actually does this.
Update:
It seems that lag
doesn't work for variables created in the same data step, but using retain
does, which is why Target_Shp_Dt_lag
intially had all missing values.
Also, are you sure about the very last row in your example dataset? It seems to be inconsistent with your rules: lag(count) + lag(actual_shp_dt) = 9/21/2014, but 9/21/2014 > 9/15/2014, so that would mean you'd need to take lag(count) + lag(target_shp_dt), which results in a missing value as lag(target_shp_date) is missing. The code below works for all the other cases.
data raw_data;
input Cust_ID :$1. Prdct_No :$3. no_of_mnths :3. Actual_Shp_Dt :mmddyy10. Count :3.;
format actual_shp_dt yymmdd10.;
cards;
x 12 2 8/1/2014 10
x 12 1 8/11/2014 10
x 12 0 8/23/2014 10
y 13 2 8/1/2014 10
y 13 1 8/11/2014 10
y 14a 2 8/1/2014 10
y 14a 1 8/11/2014 10
y 14a 0 8/21/2014 10
z 15 3 9/1/2014 20
z 15 2 9/15/2014 20
;
run;
data want;
input Cust_ID :$1. Prdct_No :$3. no_of_mnths :3. Actual_Shp_Dt :mmddyy10. Count :3. Target_ship_dt :mmddyy10.;
format actual_shp_dt Target_ship_dt yymmdd10.;
infile cards missover;
cards;
x 12 2 8/1/2014 10
x 12 1 8/11/2014 10 8/11/2014
x 12 0 8/23/2014 10 8/21/2014
y 13 2 8/1/2014 10
y 13 1 8/11/2014 10 8/11/2014
y 14a 2 8/1/2014 10
y 14a 1 8/11/2014 10 8/11/2014
y 14a 0 8/21/2014 10 8/21/2014
z 15 3 9/1/2014 20
z 15 2 9/15/2014 20 9/21/2014
;
run;
Data have(drop = Prdct_no_lag Actual_Shp_Dt_lag Count_Lag Target_Shp_Dt1 Target_Shp_Dt_lag) ;
Set raw_data;
By Cust_ID Prdct_no;
Prdct_no_lag = lag(Prdct_no);
Actual_Shp_Dt_lag = lag(Actual_Shp_Dt);
Count_Lag = lag(Count);
retain target_shp_dt_lag;
if first.prdct_no then call missing(Actual_Shp_Dt_lag, count_lag, prdct_no_lag, target_shp_dt_lag);
Target_Shp_Dt1 = Actual_Shp_Dt_lag + Count_Lag;
Target_Shp_Dt = Target_Shp_Dt1;
If Actual_Shp_Dt ne Target_Shp_Dt1 Then Target_Shp_Dt = Target_Shp_Dt_lag + Count_lag;
Target_Shp_Dt_lag = Target_Shp_Dt;
format Actual_Shp_Dt_lag Target_Shp_dt: yymmdd10.;
Run;
proc compare base = want compare = have;
var Target_Shp_Dt;
run;