Search code examples
sqlsassas-macro

Where statement searching for 2 values in any combination of multiple variables in SAS


I am trying to search for a double condition using Proc SQl. I want to search for when 'TTT' AND 'RRR' exist in v1,v2,v3,v4...v10 so it can be that TTT is in V1 and RRR is in v10 or in V1 and V2. There are many combinations but I don't want to type each combination out obviously. At the same time I want to also use an OR statement to search for variables from v1-v10 that either contain ('TTT' AND 'RRR') OR ('GGG') alone. I've been searching around, thought maybe a case when would work but I don't think so and also I need to make it a PROC SQL.

I know the below code is wrong as it's a longer version but, just to get the jist of what I mean:

WHERE date BETWEEN &start. AND &end. 
AND ( ((V1 = 'TTT' and V2 ='RRR') OR (V1 = 'GGG')) OR ((V1 = 'TTT' and V3 ='RRR') OR (V1 = 'GGG')) OR ((V1 = 'TTT' and V4 ='RRR') OR (V1 = 'GGG')) ...)

Thanks, Much Appreciated!


UPDATED version based on @Tom's Answer

 data
diag_table;
input v1 $ v2 $ v3 $ v4 $ v5 $;
cards;
TTT . . RRR .
GGG . . . .
. RRR . . TTT
. . . . .
FFF . . . .
. . RRR1 . .
TTT . . GGG .
. RRR . GGG .
run;
proc print data=diag_table;
quit;

proc sql;
create table diag_found  as
select *
from diag_table
WHERE (whichc('TTT',v1,v2,v3,v4,v5) and whichc('RRR',v1,v2,v3,v4,v5)) or (whichc('GGG',v1,v2,v3,v4,v5));
quit;
proc print data=diag_found;
quit;

enter image description here

The only problem with this code is that it's also grabbing cases where the rows contain GGG + RRR, and GGG + TTT I tried adding parentheses around the two groups but it didn't change anything.

UPDATE: @Tom and @Joe:

Yes, I guess I do mean an XOR with an AND inside.

So if A=TTT B=RRR C=GGG (A AND B) XOR C

Either A+B combination OR C

enter image description here

Thank you!!


Solution

  • You can use the WHICHC() function to do what you want.

    where whichc('TTT',v1,v2) and whichc('RRR',v1,v2)
    

    Note it is a little harder to use in SQL since you cannot use variable lists so you will need to explicitly list each variable name, whichc('TTT',v1,v2,v3,v4,v5), instead of just using a variable list,whichc('TTT', of v1-v5) , like you could in a data step.

    Not sure what you mean by GGG alone. But if by that you mean GGG without TTT or RRR then you could use logic like this.

    where (whichc('TTT',v1,v2) and whichc('RRR',v1,v2))
       or (whichc('GGG',v1,v) and not (whichc('TTT',v1,v2) or whichc('RRR',v1,v2)))