Let's say I have this df
data df;
input Country $ Year;
datalines;
USA 2019
USA 2019
USA 2020
Canada 2019
Canada 2020
Canada 2020
;
run;
I know how to create a series of numbers based on one variable but how to make it based on two variables. I want the following output
Country Year series
------- ---- ------
USA 2019 1
USA 2019 2
USA 2020 1
Canada 2019 1
Canada 2020 1
Canada 2020 2
It's nearly the same way as you would with one variable. Specify both groups in your by
statement.
data want;
set df;
by descending country year;
if(first.year) then series = 0;
series+1;
run;
Country Year series
USA 2019 1
USA 2019 2
USA 2020 1
Canada 2019 1
Canada 2020 1
Canada 2020 2
In this code, saying if(first.year)
implies that it is the first year in the group (country year)
. Your by
group can be thought of a hierarchy in this sense:
by var1 var2 var3 var4 ... varn
first. | Group |
---|---|
first.var1 | var1 |
first.var2 | var1 var2 |
first.var3 | var1 var2 var3 |
first.var4 | var1 var2 var3 var4 |
... | ... |
You can verify this by adding a binary variable showing the first values of each group. Think of it as a shortcut for if(first.var1 AND first.var2)
. first.var2
already implies this.
data want;
set df;
by descending country year;
first_country = (first.country);
first_country_year = (first.year);
run;
Country Year first_country first_country_year
USA 2019 1 1
USA 2019 0 0
USA 2020 0 1
Canada 2019 1 1
Canada 2020 0 1
Canada 2020 0 0