Search code examples
sqlsascross-join

Cross-join issue in SQL statement (SAS)


Can anyone please explain to me why this insists on doing a Cartesian product on the join? To me it would seem logical for it to use the index. Is there a way to force it to use the index (you can see I've tried but it's ignoring my idxname statement).

The below example is the simplest way I could reproduce the problem I am having and is not my actual code.

Create some test data:

data all_periods;
  do date=mdy(1,1,2015) to mdy(4,1,2015);
    do hour=0 to 23;
      do minute=0 to 59; 
        period_start = dhms(date,hour,minute,0);
        output;
      end;
    end;
  end;
run;

Create an index on the test data:

proc sql noprint;
  create index period_start on all_periods;
quit;

Perform a self-join:

proc sql noprint _method;
  create table concurrent as 
  select a.period_start,
         count(*) as result
  from all_periods a
  join all_periods (idxname=period_start) b on b.period_start lt a.period_start
  group by 1
  ;
quit;

Solution

  • Non-equijoins often don't use indexes. I would actually recommend that you do this in a data step, using retain. However, you might try this version in proc sql:

    select a.period_start,
           (select count(*)
            from all_periods b
            where b.period_start < a.period_start
           ) as result
    from all_periods a;
    

    Most databases have cumulative sums built-in using window functions. However, SAS does not support those natively. So, proc sql is not the best way to do this calculation.