Search code examples
sasdataset

Insert rows based on group ID and difference sum between two datasets


Hello Stack community,

I have a problem where I would appreciate your time and help.

Let's say I have data A with group ID 'A' and the total sum of the group ID 'A' is 11. I have another data B with the same group ID 'A' and the total sum of the group ID 'A' is 20. Now the difference is 9, I want to expand this difference of 9 into 9 rows by that group ID 'A' and /append/insert into data A. I have put the tables for your reference.

data A  

Group   Sum
A       1
A       3
A       4
A       1
A       2
Total   11

data b 

Group   Sum
A       5
A       2
A       3
A       5
A       5
Total   20

expand the difference of 9 into rows 

Group   Count      
A       1
A       1
A       1
A       1
A       1
A       1
A       1
A       1
A       1

data want 

Group   Sum
A       1
A       3
A       4
A       1
A       2
Total   11
A       1
A       1
A       1
A       1
A       1
A       1
A       1
A       1

Friends, I really appreciate and thank you for your time and help on this.

I didn't program this yet. I am not sure how to solve this.


Solution

  • This is a strange table format, but given the information you've posted here is a solution that will work for any number of groups. Note that it is not good practice to keep a total row within your data. You can always calculate the total with SQL or PROCs.

    First, calculate the differences for all groups:

    proc sql;
        create table difs as
            select a.group, b.total - a.total as dif
            from (select group, sum(sum) as total
                  from a
                  where group NE 'Total'
                  group by group
                 ) as a
            LEFT JOIN
                (select group, sum(sum) as total
                  from b
                  where group NE 'Total'
                  group by group
                ) as b
            ON a.group = b.group
            order by group
        ;
    quit;
    

    This creates the following table:

    group   dif
    A       9
    

    Next, we need to add a 1 for every value in the group:

    data counts;
        set difs;
        by group;
    
        do i = 1 to dif;
            sum = 1;
            output;
        end;
    
        drop i dif;
    run;
    

    This creates the following table:

    group   sum
    A       1
    A       1
    A       1
    A       1
    A       1
    A       1
    A       1
    A       1
    A       1
    

    Now we simply append this to the original table to get the desired output:

    data want;
        set a counts;
    run;
    

    Which produces the table we need:

    group   sum
    A       1
    A       3
    A       4
    A       1
    A       2
    Total   11
    A       1
    A       1
    A       1
    A       1
    A       1
    A       1
    A       1
    A       1
    A       1