I use SAS EG and have a data set that looks like:
CLIENT_ID Segment Yearmonth
XXXX A 201305
XXXX A 201306
XXXX A 201307
YYYY A 201305
YYYY A 201306
YYYY B 201307
i want an output that has a number assigned to a new column which resets when a new account is there:
CLIENT_ID Segment Yearmonth New_Variable
XXXX A 201305 1
XXXX A 201306 2
XXXX A 201307 3
YYYY A 201305 1
YYYY A 201306 2
YYYY B 201307 3
That was problem number one, which i solved with this code:
PROC SORT DATA= GENERAL.HISTORICAL_SEGMENTS;
by Client_ID;
RUN;
data HISTORICAL_SEGMENTS2;
SET GENERAL.HISTORICAL_SEGMENTS;
count + 1;
by Client_ID;
if first.Client_ID then count = 1;
run;
I want to create a second data set and i want to see if there is a way to get the segments only if the segment changes: For example from the above the
CLIENT_ID Segment Yearmonth New_Variable
YYYY A 201305 1
YYYY B 201306 2
Any help would be appreciated. Thanks.
Nice job on answering your first question. I think that step reads more clearly if you rearrange it a bit, e.g.:
data HISTORICAL_SEGMENTS2 ;
set GENERAL.HISTORICAL_SEGMENTS ;
by Client_ID ;
if first.Client_ID then count = 0 ;
count + 1 ;
run;
I think it's customary to put the BY statement right after the SET statement it applies to, for clarity sake. Reset the counter to 0 when Client_ID changes.
It looks like you want a second dataset, call it FIRSTS, with the first record from each by group. To do that, note that it's possible for one DATA step to write multiple output datasets. This can be done by using an explicit OUTPUT statement to write to each dataset, e.g. :
data HISTORICAL_SEGMENTS2 FIRSTS ;
set GENERAL.HISTORICAL_SEGMENTS ;
by Client_ID ;
if first.Client_ID then count = 0 ;
count + 1 ;
output HISTORICAL_SEGMENTS2 ; *output every record;
if first.Client_ID then output FIRSTS ; *output first of each group;
run;