Search code examples
google-sheetsgoogle-sheets-formula

QUERY strings using arrayformula


I need an arrayformula solution to determine the first row number in which my search term exists. I have a column of strings which represents the data to be searched (Column E below) and I have a column of single word search terms (Column A below).

Column A Column E
1 rain All terrain, and, no pain
2 sun There's 2 much, rain, in Spain
3 blue The sky, was, blue, on, Sunday
4 Purple Clouds, are pink around here
5 spot Red sunspot at night is alarming
6 3 Yellow, sun spots
7 Purple rain
8 3 spot of rain
9 5 583334378

The result I'm looking for is:

Column A Column B
rain 2
sun 6
blue 3
Purple 7
spot 8
3 6
5 Nope

An example sheet is here.

I've tried various different methods but the most successful has been the QUERY function below.

=if(A2<>"",iferror(query({arrayformula(row($E$2:$E)-1),$E$2:$E}, "select Col1 where Col2 matches '.*\b"&A2&"\b.*' limit 1"),"Nope"),)

My problem with both VLOOKUP and MATCH is that I couldn't find a solution that provided an index on a partial match within a string, only whole cell matches.

I've also tried a TRANSPOSE/SPLIT pairing but that seemed to remove my ability to determine the originating row - which is the bit I want. A helper column in a different sheet expanded the row count exponentially and I think it broke the sheet. The strings are typically a total length of 100+ chars each (including spaces and punctuation, no line feeds).

The remaining problem with QUERY solution is that it's a 'Copy Down' solution only and I need the arrayformula version. I tried this but it doesn't traverse the array.

=ArrayFormula(if($A$2:$A<>"",iferror(query({arrayformula(row($E$2:$E)-1),$E$2:$E}, "select Col1 where Col2 matches '.*\b"&$A$2:$A&"\b.*' limit 1"),"Nope"),))

I tried this from Player0 but it provides an incomplete result in my sample sheet. I can't fathom out why it misses certain search terms in the VLOOKUP part when the data clearly includes it in its REGEXEXTRACT output. I did use a helper column in G here but it misses the last row of the sheet I don't think I can allow (can't test it). ... and it also seems to hang the sheet with the real world data.

={"ArrayFormula2"; ARRAYFORMULA(IFNA(VLOOKUP($A$2:$A,{REGEXEXTRACT(""&$F$2:$F, "\b"&TEXTJOIN("\b|\b", 1, $A$2:$A)&"\b"), $G$2:$G*1}, 2, 0)))}

The requirement for a single cell formula is that this is actually going to be the second part of an existing arrayformula. If the first expression in my combined formula fails then the formula I learn here (hopefully) will be called. In my second QUERY example above I've emulated the pass/fail component with a simple 'if not empty' test.

The data set of search terms is approximately 25K-30K rows. The column of strings is approximately 5K to 10K rows. Both arrays are dynamic in that they can have their data changed at any time and rows can be added or deleted - in other words neither row count can be determined without manual interaction - something I am keen to avoid.


Solution

  • Here's one approach you may test out:

    =map(A2:A,lambda(Σ,if(Σ="",,ifna(+filter(row(F2:F)-1,--regexmatch(F2:F&"","\b"&Σ&""&"\b")=1),"Nope"))))
    

    enter image description here