I'm trying to query/filter rows from a dataset structured like this:
Creator | Title | Barcode | Inv. No. |
---|---|---|---|
springer | Cellbio | 014678 | POL02P14x |
springer | Cellbio | 026938 | POL02P26r |
springer | Cellbio | 038745 | |
nature | Cellular | 026672 | POL02P26h |
elsevier | Biomed | 026678 | POL02P26g |
elsevier | Biomed | 026678 | POL02P26g |
spring | Cellbit | POL02P147 | |
spring | Cellbit | 026938 | POL02P26j |
spring | Cellbit | 038745 |
I need to return all rows where the value/string in column B(title) is duplicate and when in those duplicate rows at least one string/value in column C(barcode) starts with 014 and at least one starts with 026. If the criteria is not met in column C the next check would be similar in column D (Inv. no.): at least one value string starts with POL02P14 and at least one starts with POL026.
So the basic logic would be something like this:
Select all rows where B is duplicate and ((at least one value in C starts with x and one with y) or ( at least one value in D starts with z and one with W)).
So the desired output should be like this:
Creator | Title | Barcode | Inv. No. |
---|---|---|---|
springer | Cellbio | 014678 | POL02P14x |
springer | Cellbio | 026938 | POL02P26r |
springer | Cellbio | 038745 | |
spring | Cellbit | POL02P147 | |
spring | Cellbit | 026938 | POL02P26j |
spring | Cellbit | 038745 |
Here is a sample spreadsheet more similar to the actual dataset which is fairly large:
https://docs.google.com/spreadsheets/d/1xj5LnOxIwEmcjnXD0trmvcCKJIGIcfDkARV80Hx5Fvc/edit?usp=sharing
Tried adapting formulas with similar logic but always getting errors or unexpected results either the query logic/syntax is wrong or there is filter/array dimension mismatch.
Some examples(the column references are mixed up here because i was trying to reduce the number of columns) :
=FILTER(query(list!A1:AR, "Select * where C starts with 'POL02P'"), list!B1:B<>"",COUNTIF(list!B1:B,list!B1:B)>1)
={results!A1:AR1;array_constrain(
query(
{Filter({results!A2:AR,results!AR2:AR},REGEXMATCH(results!D2:D, "^POL02P14|POL02P26"));
countif(index(Filter({results!A2:AR,results!AR2:AR},REGEXMATCH(results!D2:D, "^POL02P14|POL02P26")),0,45),
index(Filter({results!A2:AR,results!AR2:AR},REGEXMATCH(results!D2:D, "^POL02P14|POL02P26")),0,45))}
,"Select * where Col46>1")
,9^9,44)}
=query(FILTER({list!A2:A&list!J2:J,list!A2:J,
iferror(
vlookup(list!A2:A&list!J2:J,query(query(filter(list!A2:A&
list!J2:J,REGEXMATCH(list!C2:C, "^POL02P14|POL02P26")),
"select Col4, count(Col4) where Col4 <> '' group by Col4"),
"select Col4 where Col2 >1 "),1,false))},REGEXMATCH(list!C2:C, "^POL02P14|POL02P26")),
"select Col1, Col2, Col3, Col5, Col6, Col7, Col8, Col9, Col10, Col11 where Col12 <> ''
order by Col3 asc, Col11 asc")
Please try this out in your sample sheet:
={results!A1:AR1;FILTER(results!A2:AR,REGEXMATCH(results!B2:B,JOIN("|","^"&LAMBDA(z,LAMBDA(x,y,z,{filter(filter(x,y="014"),xmatch(filter(x,y="014"),filter(x,y="026")));filter(filter(x,z="POL02P14"),xmatch(filter(x,z="POL02P14"),filter(x,z="POL02P26")))})(INDEX(z,,1),INDEX(z,,2),INDEX(z,,3)))((UNIQUE(FILTER({results!B2:B,LEFT(results!C2:C,3),LEFT(results!D2:D,8)},results!B2:B<>"",results!D2:D<>""))))&"$")))}
formula logic at a glance:
TRUE
for either step 2 OR step 3 above