Can anyone please explain to me why this insists on doing a Cartesian product on the join? To me it would seem logical for it to use the index. Is there a way to force it to use the index (you can see I've tried but it's ignoring my idxname
statement).
The below example is the simplest way I could reproduce the problem I am having and is not my actual code.
Create some test data:
data all_periods;
do date=mdy(1,1,2015) to mdy(4,1,2015);
do hour=0 to 23;
do minute=0 to 59;
period_start = dhms(date,hour,minute,0);
output;
end;
end;
end;
run;
Create an index on the test data:
proc sql noprint;
create index period_start on all_periods;
quit;
Perform a self-join:
proc sql noprint _method;
create table concurrent as
select a.period_start,
count(*) as result
from all_periods a
join all_periods (idxname=period_start) b on b.period_start lt a.period_start
group by 1
;
quit;
Non-equijoins often don't use indexes. I would actually recommend that you do this in a data step, using retain
. However, you might try this version in proc sql
:
select a.period_start,
(select count(*)
from all_periods b
where b.period_start < a.period_start
) as result
from all_periods a;
Most databases have cumulative sums built-in using window functions. However, SAS does not support those natively. So, proc sql
is not the best way to do this calculation.