Insert rows based on group ID and difference sum between two datasets

Hello Stack community,

I have a problem where I would appreciate your time and help.

Let's say I have data A with group ID 'A' and the total sum of the group ID 'A' is 11. I have another data B with the same group ID 'A' and the total sum of the group ID 'A' is 20. Now the difference is 9, I want to expand this difference of 9 into 9 rows by that group ID 'A' and /append/insert into data A. I have put the tables for your reference.

data A

Group   Sum
A       1
A       3
A       4
A       1
A       2
Total   11

data b

Group   Sum
A       5
A       2
A       3
A       5
A       5
Total   20

expand the difference of 9 into rows

Group   Count      
A       1
A       1
A       1
A       1
A       1
A       1
A       1
A       1
A       1

data want

Group   Sum
A       1
A       3
A       4
A       1
A       2
Total   11
A       1
A       1
A       1
A       1
A       1
A       1
A       1
A       1

Friends, I really appreciate and thank you for your time and help on this.

I didn't program this yet. I am not sure how to solve this.

Solution

This is a strange table format, but given the information you've posted here is a solution that will work for any number of groups. Note that it is not good practice to keep a total row within your data. You can always calculate the total with SQL or PROCs.

First, calculate the differences for all groups:

proc sql;
    create table difs as
        select a.group, b.total - a.total as dif
        from (select group, sum(sum) as total
              from a
              where group NE 'Total'
              group by group
             ) as a
        LEFT JOIN
            (select group, sum(sum) as total
              from b
              where group NE 'Total'
              group by group
            ) as b
        ON a.group = b.group
        order by group
    ;
quit;

This creates the following table:

group   dif
A       9

Next, we need to add a 1 for every value in the group:

data counts;
    set difs;
    by group;

    do i = 1 to dif;
        sum = 1;
        output;
    end;

    drop i dif;
run;

This creates the following table:

group   sum
A       1
A       1
A       1
A       1
A       1
A       1
A       1
A       1
A       1

Now we simply append this to the original table to get the desired output:

data want;
    set a counts;
run;

Which produces the table we need:

group   sum
A       1
A       3
A       4
A       1
A       2
Total   11
A       1
A       1
A       1
A       1
A       1
A       1
A       1
A       1
A       1