Search code examples
sasnumbersseries

How to create a series of number based on two variables?


Let's say I have this df

data df;
input Country $ Year;
datalines;
USA 2019
USA 2019
USA 2020
Canada 2019
Canada 2020
Canada 2020
;
run;

I know how to create a series of numbers based on one variable but how to make it based on two variables. I want the following output

Country   Year   series
-------   ----   ------
USA       2019   1
USA       2019   2
USA       2020   1
Canada    2019   1
Canada    2020   1
Canada    2020   2

Solution

  • It's nearly the same way as you would with one variable. Specify both groups in your by statement.

    data want;
        set df;
        by descending country year;
    
        if(first.year) then series = 0;
    
        series+1;
    run;
    
    Country Year    series
    USA     2019    1
    USA     2019    2
    USA     2020    1
    Canada  2019    1
    Canada  2020    1
    Canada  2020    2
    

    In this code, saying if(first.year) implies that it is the first year in the group (country year). Your by group can be thought of a hierarchy in this sense:

    by var1 var2 var3 var4 ... varn

    first. Group
    first.var1 var1
    first.var2 var1 var2
    first.var3 var1 var2 var3
    first.var4 var1 var2 var3 var4
    ... ...

    You can verify this by adding a binary variable showing the first values of each group. Think of it as a shortcut for if(first.var1 AND first.var2). first.var2 already implies this.

    data want;
        set df;
        by descending country year;
    
        first_country      = (first.country);
        first_country_year = (first.year);
    run;
    
    Country Year    first_country   first_country_year
    USA     2019    1               1
    USA     2019    0               0
    USA     2020    0               1
    Canada  2019    1               1
    Canada  2020    0               1
    Canada  2020    0               0