Search code examples
countsasflagsproc-sqldatastep

SAS Counting Occurrences based on multiple layers within set time period


I am trying to count occurrences where the same person was billed for an item, four or more times, by the same place within 30 days of each instance. For example, input would look something like:

person   service  place  date
A         x       shop1  01/01/15
A         x       shop1  01/15/15
A         x       shop1  01/20/15
B         y       shop2  03/20/15 
B         y       shop2  04/01/15
C         z       shop1  05/05/15

And output would look something like:

person   service  place  date      count
A         x       shop1  01/01/15  3
A         x       shop1  01/15/15  3
A         x       shop1  01/20/15  3
B         y       shop2  03/20/15  2 
B         y       shop2  04/01/15  2
C         z       shop1  05/05/15  1

I have tried stuff similar to:

data work.want;
 do _n_ =1  by 1 until (last.PLACE);  
   set work.rawdata;
   by PERSON PLACE;
   if first.PLACE then count=0;
   count+1;
 end;
 frequency= count;  
 do _n_ = 1 by 1 until (last.PLACE); 
   set work.rawdata;
   by PERSON PLACE;
   output;
 end;
run;

this gives a count based on person and place but does not factor in time. Any help or suggestions would be greatly appreciated! Thank you


Solution

  • This can be done easily with proc sql...

    Your data:

    data have;
    input person $ service $ place $;
    datalines;
    A         x       shop1
    A         x       shop1
    A         x       shop1
    B         y       shop2
    B         y       shop2
    C         z       shop1
    ;
    run;
    

    Then we count the occurences of "place" for each 1,2 group, and join the original table.

    proc sql;
    create table want as
    select a.*, b._count
    from have as a
    inner join 
    (
        select person, service, count(place) as _count
        from have 
        group by 1,2
    ) as b
    on a.person = b.person
    and a.service = b.service
    ;
    quit;
    

    Is there a date field? We need it in order to group the data by month (or 30 days), for example.