Search code examples
joinsasleft-joinsas-studio

Left join returns blanks


I have performed a left join where the left table has 500,000 observations. In some cases the left join has been successful for Business_Line = "Retail" where as the next observation is left blank, why is this?

the code I used:

proc sql;
create table joined2 as
select a.*
      ,b.Join1
      ,b.Join2
      ,b.Join3
from joined as a
left join Sasdata.Assumptions as b
on a.Business_Line = b.Business_Line;
quit;

the two tables look like

data joined;
input Business_Line $;
datalines;
Retail
Retail
Retail
Business
Business
;
run;

the table to join looks like

data sasdata.assumptions;
input Business_Line $ Join1 Join2 Join3;
datalines;
Retail 10% 10% 10%
Business 20% 10% 5%
;
run;

the current resulting table looks like

 business_line join1 join2 join3
 Retail 10% 10% 10%
 Retail . . .
 Business 20% 10% 5%
 Business . . .

Solution

  • The sample code does not demonstrate the issue.

    Indeed, missing values for join1-join3 will not happen when the actual business_lines values are 'Retail' or Business. You get a result with 3x1 rows with Retail and 2x1 rows with Business.

    The missing values occur when the join key in the left table does not have a corresponding match in the right table. This can appear to happen in SAS if the variables are formatted.

    Suppose business_line was an integer with formatted value

    proc format;
      value line
        101 = 'Retail'
        102 = 'Retail'
        103 = 'Retail'
        201 = 'Business'
        202 = 'Business'
      ;
    

    Updated data with formatted business_line

    data joined;
    input Business_Line;
    format Business_Line line.;
    datalines;
    101
    102
    102
    201
    202
    run;    
    
    data assumptions;
    input Business_Line Join1 Join2 Join3;
    format Business_Line line.;
    datalines;
    101 10 10 10
    201 20 10  5
    run;
    

    Join that has some unmatched underlying values

    proc sql;
    create table joined2 as
    select a.*
          ,b.Join1
          ,b.Join2
          ,b.Join3
    from joined as a
    left join Assumptions as b
    on a.Business_Line = b.Business_Line;
    quit;
    
    options nocenter; ods listing;
    proc print data=joined2;
    run;
    

    Has results demonstrating missing values

           Business_
    Obs      Line       Join1    Join2    Join3
    
     1     Retail         10       10       10
     2     Retail          .        .        .
     3     Retail          .        .        .
     4     Business       20       10        5
     5     Business        .        .        .