I have a a customer level data with customer's Pre-covid, post-covid and In-covid Balances. The data look something like this
Accountid | Covid Flag | Balance |
---|---|---|
123 | Pre-Covid | 100 |
123 | In-Covid | 200 |
123 | Post-Covid | 400 |
I need to create a new column with % difference between these covid flags. So the extra column should create the % difference between the balance of pre-covid period to in covid(row 1 to 2), in covid to pre-covid(from row 2 to 3) and finally from pre-covid to post covid(row 1 and 3)
the final data should look something like this
Accountid | COVID FLAG | % Difference |
---|---|---|
123 | pre to in Covid | 100% |
123 | In to Post Covid | 100% |
123 | pre to Post Covid | 300% |
How do I create the % difference column and the new covid Flag?
i can only think of Lag function to do this, i can us ethe lag function for 1 to 2, 2 to 3 , but how do i do this for 1 to 3?
Since there are only three values, we can use some simple data step logic to store all of our values of interest into temporary variables as we find them, then output them one at a time at the last row of each account ID. To illustrate this, here is what the background calculations look like as we read row-by-row:
accountid covid_flag balance pre_covid in_covid post_covid pct_diff
123 Pre-Covid 100 100 . . .
123 In-Covid 200 100 200 . .
123 Post-Covid 400 100 200 400 .
----------------------------------------------------------------------------------------
Point where we output and calculate % diff
----------------------------------------------------------------------------------------
123 pre to in Covid 400 100 200 400 100%
123 In to Post Covid 400 100 200 400 100%
123 pre to Post Covid 400 100 200 400 300%
Here's how this code looks:
data want;
set have;
by accountid;
/* Temporary variables to hold the balance found in each row */
retain pre_covid in_covid post_covid;
/* Reset temporary variables at the start of each account ID */
if(first.accountid) then call missing(pre_covid, in_covid, post_covid);
/* Save each covid flag balance to temporary variables */
select(upcase(covid_flag) );
when('PRE-COVID') pre_covid = balance;
when('IN-COVID') in_covid = balance;
when('POST-COVID') post_covid = balance;
end;
/* Uncomment to view intermediate steps */
/* output;*/
/* At the very last account, calculate the differences and output for each one */
if(last.accountid) then do;
covid_flag = 'pre to in Covid';
pct_diff = (in_covid - pre_covid)/pre_covid;
output;
covid_flag = 'In to Post Covid';
pct_diff = (post_covid - in_covid)/in_covid;
output;
covid_flag = 'pre to Post Covid';
pct_diff = (post_covid - pre_covid)/pre_covid;
output;
end;
format pct_diff percent8.;
run;