Search code examples
sqlsasleft-join

Sum and count variables while executing a left join


Example code--SQL within SAS:

proc sql;
  create table add_losses as 
  select *, 
    sum(bb.gross_loss) as gl format = comma15.2, 
    count(bb.gross_loss) as n_losses
  from add_startend as aa   
  left join LED as bb 
    on (aa.process_name = bb.process_name and
        aa.group_id = bb.group_code and
        aa.start_date le bb.first_loss_posting_date le aa.end_date)

  group by aa.process_name, aa.group_id, aa.start_date, aa.end_date
  order by aa.process_name, aa.group_id, aa.start_date, aa.end_date;
quit;

Example data and desired output below:

Table AA

variable 1  variable 2  start date  end date
AAAA        BBB         1/1/2010    6/1/2010

Table BB

variable 1  variable 2  Date      losses
AAAA        BBB         1/5/2010    100
AAAA        BBB         2/1/2010    100
AAAA        BBB         3/5/2010    100
AAAA        BBB         4/23/2010   100
AAAA        BBB         5/11/2010   100
AAAA        BBB         5/25/2010   100

Table YY (current output)

variable 1  variable 2  Date    gross_loss  gl  n_losses
AAAA        BBB         1/5/2010    100     600 6
AAAA        BBB         2/1/2010    100     600 6
AAAA        BBB         3/5/2010    100     600 6
AAAA        BBB         4/23/2010   100     600 6
AAAA        BBB         5/11/2010   100     600 6
AAAA        BBB         5/25/2010   100     600 6

Table XX (desired output)

variable 1  variable 2  start date  end date    gl  n_losses
AAAA        BBB         1/1/2010    6/1/2010    600     6

The problem is the current code creates additional observations. How do I keep the same number of rows and all variables in table AA while adding on the columns gl and n_losses?


Solution

  • You can use Between clause for this type of condition

    proc sql;
    create table add_losses as 
    select aa.*, bb.gl format, bb.n_losses
    from
    add_startend as aa
    left join
    (
    select aa.process_name, aa.group_id, aa.start_date, aa.end_date, 
    sum(bb.gross_loss) as gl format = comma15.2, 
    count(bb.gross_loss) as n_losses
    from add_startend as aa     
    left join LED as bb 
    
    on (aa.process_name = bb.process_name and
    aa.group_id = bb.group_code and
    bb.first_loss_posting_date between aa.start_date and aa.end_date)
    
    group by aa.process_name, aa.group_id, aa.start_date, aa.end_date
    ) bb
    on  aa.process_name = bb.process_name 
    and aa.group_id = bb.group_code
    and aa.start_date = aa.start_date
    and aa.end_date = bb.end_date
    order by aa.process_name, aa.group_id, aa.start_date, aa.end_date;
    quit;