Search code examples
sasdummy-data

How to create dummy variables to indicate two values are the same using SAS


My data looks like:

ID      YEAR    A   B
1078    1989    1   0
1078    1999    1   1
1161    1969    0   0
1161    2002    1   1
1230    1995    0   0
1230    2002    0   1
1279    1996    0   0
1279    2003    0   1
1447    1993    1   0
1447    2001    1   1
1487    1967    0   0
1487    2008    1   1
1487    2008    1   0
1487    2009    0   1
1678    1979    1   0
1678    2002    1   1
1690    1989    1   0
1690    1993    0   1
1690    1993    0   0
1690    1996    0   1
1690    1996    0   0
1690    1997    1   1

I'd like to create two dummy variables, new and X, the scenarios are as follows: within each ID-B pair (a pair is 2 observations one with B=0 and the other B=1 with YEAR closet together in sequence)

  1. if the observation with B=1 has a value of 1 for A then new=1 for both observations in that pair, otherwise it is 0 for both observations in that pair, and
  2. if the pair has the same value in A then X=0 and if they have different values then X=1.

Therefore, the output would be:

ID      YEAR    A   B   new X
1078    1989    1   0   1   0
1078    1999    1   1   1   0
1161    1969    0   0   1   1
1161    2002    1   1   1   1
1230    1995    0   0   0   0
1230    2002    0   1   0   0
1279    1996    0   0   0   0
1279    2003    0   1   0   0
1447    1993    1   0   1   1
1447    2001    1   1   1   1
1487    1967    0   0   1   1
1487    2008    1   1   1   1
1487    2008    1   0   0   1
1487    2009    0   1   0   1
1678    1979    1   0   1   0
1678    2002    1   1   1   0
1690    1989    1   0   0   1
1690    1993    0   1   0   1
1690    1993    0   0   0   0
1690    1996    0   1   0   0
1690    1996    0   0   1   1
1690    1997    1   1   1   1

My codes are

data want;
 set have;
 by ID;
 if B=1 and A=1 then new=1;
 else new=0;
run;

proc sql;
    create table out as
    select a.*,max(a.B=a.A & a.B=1) as new,^(min(A)=max(A)) as X
    from have a
    group by ID;quit;

The first one doesn't work and the second one reordered variable B. I am stuck here. Any help will be greatly appreciated.


Solution

  • Very manual solution, I just used the retain statement to identify the pairs (dataset already in the required order).

        data start;
            set start;
            retain pair 0;
            if B=0 then pair=pair+1;
        run;
    
        data ForNew;
            set start(where=(B=1));
            New=(A=B);  /*Boolean variable=1 if the condition in brackets is true*/
            keep pair New;
        run;
    /*if A has equal values mean will be 0 or 1*/
        proc means data=start NWAY NOPRINT;
            class pair;
            var A;
            output out=ForX(drop=_: where=(media in (0,1)) keep=pair media) mean(A)=media;
        run;
    
        data end;
            merge start ForNew ForX(in=INX drop=media);
            by pair;
            X=(^INX);
        run;