Search code examples
sassas-macro

Calculate the % difference between two rows in SAS


I have a a customer level data with customer's Pre-covid, post-covid and In-covid Balances. The data look something like this

Accountid Covid Flag Balance
123 Pre-Covid 100
123 In-Covid 200
123 Post-Covid 400

I need to create a new column with % difference between these covid flags. So the extra column should create the % difference between the balance of pre-covid period to in covid(row 1 to 2), in covid to pre-covid(from row 2 to 3) and finally from pre-covid to post covid(row 1 and 3)

the final data should look something like this

Accountid COVID FLAG % Difference
123 pre to in Covid 100%
123 In to Post Covid 100%
123 pre to Post Covid 300%

How do I create the % difference column and the new covid Flag?

i can only think of Lag function to do this, i can us ethe lag function for 1 to 2, 2 to 3 , but how do i do this for 1 to 3?


Solution

  • Since there are only three values, we can use some simple data step logic to store all of our values of interest into temporary variables as we find them, then output them one at a time at the last row of each account ID. To illustrate this, here is what the background calculations look like as we read row-by-row:

    accountid       covid_flag        balance   pre_covid   in_covid    post_covid  pct_diff
    123             Pre-Covid         100       100         .           .           .
    123             In-Covid          200       100         200         .           .
    123             Post-Covid        400       100         200         400         .
    ----------------------------------------------------------------------------------------
                           Point where we output and calculate % diff
    ----------------------------------------------------------------------------------------
    123             pre to in Covid   400       100         200         400         100%
    123             In to Post Covid  400       100         200         400         100%
    123             pre to Post Covid 400       100         200         400         300%
    

    Here's how this code looks:

    data want;
        set have;
        by accountid;
    
        /* Temporary variables to hold the balance found in each row */
        retain pre_covid in_covid post_covid;
    
        /* Reset temporary variables at the start of each account ID */
        if(first.accountid) then call missing(pre_covid, in_covid, post_covid);
    
        /* Save each covid flag balance to temporary variables */
        select(upcase(covid_flag) );
            when('PRE-COVID')  pre_covid   = balance;
            when('IN-COVID')   in_covid    = balance;
            when('POST-COVID') post_covid  = balance;
        end;
    
        /* Uncomment to view intermediate steps */
        /* output;*/
    
        /* At the very last account, calculate the differences and output for each one */
        if(last.accountid) then do;
            covid_flag = 'pre to in Covid';
            pct_diff   = (in_covid - pre_covid)/pre_covid;
            output;
    
            covid_flag = 'In to Post Covid';
            pct_diff   = (post_covid - in_covid)/in_covid;
            output;
    
            covid_flag = 'pre to Post Covid';
            pct_diff   = (post_covid - pre_covid)/pre_covid;
            output;
        end;
    
        format pct_diff percent8.;
    run;