How to write proc sql without windowfunction over partition by sum?

I just started learning SAS and realised that proc sql don't use window functions. As I am more at ease with sql I was wondering how I can simulate a sum window function in proc?

desired result

 select a.active, a.store_id, a.nbr, sum(nbr) over (partition by a.store_id)
from(select active, store_id, count(customer_id) as nbr from customer group by active, store_id) as a
;

active	store_id	nbr	sum
0	1	8	326
1	1	318	326
0	2	7	273
1	2	266	273

eg of raw data

select active, store_id, customer_id
from customer
limit 10;

active	store_id	customer_id
1	1	1
1	1	2
1	2	3
1	2	4
1	1	5
1	1	6
0	1	7
1	2	8
1	1	9
1	2	10

current result and query

    select a.active, a.store_id, a.nbr, sum(nbr) 
    from(select active, store_id, count(customer_id) as nbr from customer group by active, store_id) as a
    group by a.active, a.store_id, a.nbr;

active	store_id	nbr	sum
0	1	8	8
1	1	318	318
0	2	7	7
1	2	266	266

Solution

You can do the equivalent in proc sql by merging two sub-queries: one for the count of customers by active, store_id, and another for the total customers for each store_id.

proc sql noprint;
    create table want as
        select t1.active
             , t1.store_id
             , t1.nbr
             , t2.sum
        from (select active
                   , store_id
                   , count(customer_id) as nbr
              from have 
              group by store_id, active
             ) as t1

        LEFT JOIN

             (select store_id
                   , count(customer_id) as sum
              from have
              group by store_id
             ) as t2

        ON t1.store_id = t2.store_id
    ;
quit;

If you wanted to do this in a more SASsy way, you can run proc means and merge together the results from a single dataset that holds everything you need. proc means will calculate all possible combinations of your variables by default.

proc means data=have noprint;
    class store_id active;
    ways 1 2;

    output out=want_total
        n(customer_id) = total
    ;
run;

data want;
    merge want_total(where=(_TYPE_ = 3) rename=(total = nbr) )
          want_total(where=(_TYPE_ = 2) rename=(total = sum) keep=_TYPE_ store_id total)
    ;
    by store_id;

    drop _:;
run;

Or, in SQL:

proc sql;
    create table want as
        select t1.store_id
             , t1.active
             , t1.total as nbr
             , t2.total as sum
        from want_total as t1
        LEFT JOIN
             want_total as t2
        ON t1.store_id = t2.store_id
        where     t1._TYPE_ = 3
              AND t2._TYPE_ = 2
    ;
quit;

The _TYPE_ variable identifies the level of the analysis. For example, _TYPE_ = 1 is for active only, _TYPE_ = 2 is for store_id only, and _TYPE_ = 3 is for all combinations. You can view this in the output dataset from proc means:

store_id    active  _TYPE_  _FREQ_  total
.           0       1        3      3
.           1       1        7      7
1           .       2        6      6
2           .       2        4      4
1           0       3        1      1
1           1       3        5      5
2           0       3        2      2
2           1       3        2      2

And if you wanted faster high-performance results, check out its big sibling, proc hpsummary.

Therein lies the cool thing about SAS: You can bounce between PROCs, SQL, the DATA Step, and Python via Pandas/proc python. You can exploit the unique benefits of each of these methods and thought processes for any number of data engineering and statistics problems.