How to create a series of number based on two variables?

Let's say I have this df

data df;
input Country $ Year;
datalines;
USA 2019
USA 2019
USA 2020
Canada 2019
Canada 2020
Canada 2020
;
run;

I know how to create a series of numbers based on one variable but how to make it based on two variables. I want the following output

Country   Year   series
-------   ----   ------
USA       2019   1
USA       2019   2
USA       2020   1
Canada    2019   1
Canada    2020   1
Canada    2020   2

Solution

It's nearly the same way as you would with one variable. Specify both groups in your by statement.

data want;
    set df;
    by descending country year;

    if(first.year) then series = 0;

    series+1;
run;

Country Year    series
USA     2019    1
USA     2019    2
USA     2020    1
Canada  2019    1
Canada  2020    1
Canada  2020    2

In this code, saying if(first.year) implies that it is the first year in the group (country year). Your by group can be thought of a hierarchy in this sense:

by var1 var2 var3 var4 ... varn

first.	Group
first.var1	var1
first.var2	var1 var2
first.var3	var1 var2 var3
first.var4	var1 var2 var3 var4
...	...

You can verify this by adding a binary variable showing the first values of each group. Think of it as a shortcut for if(first.var1 AND first.var2). first.var2 already implies this.

data want;
    set df;
    by descending country year;

    first_country      = (first.country);
    first_country_year = (first.year);
run;

Country Year    first_country   first_country_year
USA     2019    1               1
USA     2019    0               0
USA     2020    0               1
Canada  2019    1               1
Canada  2020    0               1
Canada  2020    0               0