Search code examples
sqlrsasinner-joinsqldf

How do I join two dataframes, based on conditions, with no common variable?


I am trying to recreate the following SAS code in R

PROC SQL;
    create table counts_2018 as 
    select a.*, b.cell_no 
    from work.universe201808 a, work.selpar17 b 
    where a.newregionxx = b.lower_region2 
      and a.froempment >= b.lower_size 
      and a.froempment <= b.upper_size 
      and a.frosic07_2 >= b.lower_class2 
      and a.frosic07_2 <= b.upper_class2;
QUIT;

What this does, in effect, is assign the cell_no found in selpar17 to the data in universe201808, based on the fulfillment of all 6 conditions outlined in the code. Data which does not fulfill these conditions, and thus won't have a cell_no assigned to it, is not included in the final table.

The documentation/answers I have found so far all start with a step where the two dataframes are merged by a common variable, then an sqldf select is carried out. I do not have a common column, and thus I cannot merge my dataframes.


Solution

  • Currently, you are running an implicit join between the two tables which is not advised in SQL. Per ANSI-1992 (a 25+ year specification) that made the explicit JOIN the standard way of joining relations, consider revising your SQL query accordingly.

    Contrary to your statement, you in fact do have a common column between the tables as shown in your equality condition: a.newregionxx = b.lower_region2 which can serve as the JOIN condition. Even use the BETWEEN operator for concision:

    new_df <- sqldf('select u.*, s.cell_no 
                     from universe201808 u
                     inner join selpar17 s 
                             on u.newregionxx = s.lower_region2 
                     where u.froempment between s.lower_size and s.upper_size 
                       and u.frosic07_2 between s.lower_class2 and s.upper_class2')
    

    In fact, you can remove the where altogether and place all in the on clause:

    ...
    on u.newregionxx = s.lower_region2 
    and u.froempment between s.lower_size and s.upper_size 
    and u.frosic07_2 between s.lower_class2 and s.upper_class2