Search code examples
sqljoinsascartesian

Cartesian Join SAS Proc SQL


I have two data sets that look like this:

data sales;
    format week date9.;
    input store $ week date9. sales;
cards;
A 01JAN2014 200
A 08JAN2014 500
A 22JAN2014 200
B 01JAN2014 100
B 08JAN2014 200
B 15JAN2014 200
;
run;

data dates;
    format week date9.;
    input week date9.;
cards;
01JAN2014
08JAN2014
15JAN2014
22JAN2014
29JAN2014
;
run;

These are small examples of very large tables in a database. I want to join them using a proc sql step, so that all dates are present for each location (with missing values if no sales). I've been creating a cartesian join of the stores and weeks to produce the desired result, but this method has proven to be extremely inefficient when I attempt to query large amounts of data. I know there is a better way to do this, but cannot get it. Below is my attempt...to clarify, the full_sales data set IS the desired output, I just need a more efficient way to do it. Thanks.

proc sql;
    create table cartesian as
    select distinct sales.store, dates.week
    from sales, dates
    order by 1,2;
quit;

proc sql;
    create table full_sales as
    select cartesian.store, 
                cartesian.week,
                sales.sales
    from cartesian
    left join sales
        on cartesian.store=sales.store and
        cartesian.week=sales.week
    order by 1,2
    ;
quit;

Solution

  • I would approach this as:

    proc sql;
        create table full_sales as
        select s.store, w.week, ssa.sales
        from (select distinct store from sales) s cross join
             dates w left join
             sales sa
             on s.store = sa.store and
                w.week = sa.week
        order by 1,2
        ;
    quit;
    

    This does not require the auxiliary table cartesian, which might be faster. An index on sales(store, week) would definitely speed the query.