I have a SAS dataset for 200million records with Date, Id and amount for year 2022
Have:
ID Date Amount
101 1/31/2022 50
102 1/31/2022 100
101 2/28/2022 25
.....................
I am trying to get the unique # of accounts for each month but the it is taking more than an hour to give me the results
proc sql;
create table unique as select distinct
date,
count(distinct id) as uniqid
from Have
group by date
quit;
Is there an alternate method to have the results quickly?
One alternate available method is to use proc freq
with the nlevels
option and access its ods output. The one catch is that your data must be sorted by date
. If it is not, you'll need to sort your 200M row table by date
which could take much longer than a SQL statement alone. If you have to do this, try keeping only the two variables of interest in a temporary table.
proc sort data = have(keep=date id)
out = have_sorted;
by date;
run;
ods select none;
proc freq data=have_sorted nlevels;
by date;
tables id;
ods output nlevels=unique_count(keep = date nlevels
rename = (nlevels = uniqid)
);
run;
Other options include trying to tune your bufno
and bufsize
system options. Check out this case study from SANrd on how this affects I/O.
Also consider looking into the %freqyeah
macro which parallelizes proc freq
: From FREQing Slow to FREQing Fast: Facilitating a Four-Times-Faster FREQ with Divide-and-Conquer Parallel Processing