Search code examples
sasgroup

SAS: Determine if value is in list of values grouped by two variables


I am looking for something that will group a dataset by ID1 and ID2, then go have a look if Start2 is present within Start1 for each group. I have tried using proc sql, but I can not get the grouping to work properly. See example from below:

ID1 |  ID2  |    Start1     |    Start2    |
 1  |   1   |  01-05-2015   |  01-10-2015  | Both values of Start2 are in Start1 for ID1 = 1 and ID2 = 1 and should not output. 
 1  |   1   |  01-10-2015   |  01-05-2015  | Both values of Start2 are in Start1 for ID1 = 1 and ID2 = 1 and should not output.  
 1  |   2   |  01-03-2017   |  01-09-2017  | Should output
 1  |   2   |  01-03-2017   |  01-06-2017  | Should output
 1  |   2   |  01-03-2017   |  01-03-2017  | 
 2  |   1   |  01-05-2015   |  01-10-2016  |
 2  |   1   |  01-05-2015   |  01-05-2015  |
 2  |   1   |  01-05-2015   |  01-07-2015  | Edited: Should output
 2  |   1   |  01-10-2016   |  01-10-2016  |
 2  |   1   |  01-10-2016   |  01-05-2015  |
 2  |   1   |  01-10-2016   |  01-07-2015  | Edited: Should output

So an output looking like: 

ID1 |  ID2  |    Start1     |    Start2    |
 1  |   2   |  01-03-2017   |  01-09-2017  |
 1  |   2   |  01-03-2017   |  01-06-2017  |
 2  |   1   |  01-05-2015   |  01-07-2015  |
 2  |   1   |  01-10-2016   |  01-07-2015  |

Any help would be very appreciated! My current attempt looks something like this (inspired by Egor below):

proc sql;
   create table want as 
   select *
   from have
   group by ID1, ID2
   having not Start2 in (
        select Start1
        from have
        group by ID1, ID2)
;

This seems to also give the desired output, however, when I test it on my fullsize dataset, I can see that this is missing certain rows that should be outputted.

Sample data:

data a;
infile cards dlm="|";
format Start1 Start2 DDMMYY10.;
input id1:Best. id2:Best. Start1:DDMMYY10. Start2:DDMMYY10.;
cards;
 1  |   1   |  01-05-2015   |  01-10-2015  
 1  |   1   |  01-10-2015   |  01-05-2015  
 1  |   2   |  01-03-2017   |  01-09-2017  
 1  |   2   |  01-03-2017   |  01-06-2017  
 1  |   2   |  01-03-2017   |  01-03-2017  
 2  |   1   |  01-05-2015   |  01-10-2016  
 2  |   1   |  01-05-2015   |  01-05-2015  
 2  |   1   |  01-05-2015   |  01-07-2015  
 2  |   1   |  01-10-2016   |  01-10-2016  
 2  |   1   |  01-10-2016   |  01-05-2015  
 2  |   1   |  01-10-2016   |  01-07-2015  
 ;
 run;

Solution

  • Try 2

    Input data:

    data a;
    infile cards dlm="|";
    format Start1 Start2 DDMMYY10.;
    input id1:Best. id2:Best. Start1:DDMMYY10. Start2:DDMMYY10.;
    cards;
     1  |   1   |  01-05-2015   |  01-10-2015  
     1  |   1   |  01-10-2015   |  01-05-2015  
     1  |   2   |  01-03-2017   |  01-09-2017  
     1  |   2   |  01-03-2017   |  01-06-2017  
     1  |   2   |  01-03-2017   |  01-03-2017  
     2  |   1   |  01-05-2015   |  01-10-2016  
     2  |   1   |  01-05-2015   |  01-05-2015  
     2  |   1   |  01-05-2015   |  01-07-2015  
     2  |   1   |  01-10-2016   |  01-10-2016  
     2  |   1   |  01-10-2016   |  01-05-2015  
     2  |   1   |  01-10-2016   |  01-07-2015  
     ;
     run;
    
    

    Solution:

    proc sql;
       create table c as 
       select distinct id1, id2, Start1
       from a
    ;
     proc sql;
       create table b as 
       select id1, id2, Start1, Start2
       from a
       where Start2 not in (select Start1 from c where a.id1=c.id1 and a.id2=c.id2)
    ;
    

    Result:

    1   2   01/03/2017  01/09/2017
    1   2   01/03/2017  01/06/2017
    2   1   01/05/2015  01/07/2015
    2   1   01/10/2016  01/07/2015