Search code examples
regexgoogle-sheetsgoogle-sheets-formulare2

Extract all matches to neighboring column cell in Google Sheets


Question: looking for Google Sheets regex that captures all instances of a string between [t- ] and outputs to the neighboring column cell as an array, or some other delimiter between matches.

For the following string, I'm trying to extract all instances of text between [t- ].

A1:

Lorem Ipsum [t- is simply] dummy text of the [t- printing] and typesetting [c- industry], so [d- it would make] sense that

Expected Output is an array of all occurrences in a single column:

B1:

[is simply, printing]

Or output could be any delimiter of match occurrences

is simply | printing

Trying the following with a single text within [t- ] works fine, but for multiple instances it extracts everything between open [t- of first occurrence and ] of last occurrence:

=REGEXEXTRACT(A1,"\[t- (.*)\]")

Leading to:

is simply]! dummy text of the [t- printing

I've also tried multiple capture groups, but this only works if I'm sure there's only two instances of text between [t- ]— there could be n instances per row. Also it doesn't output results to an arrary in one column, but spreads across multiple columns:

=regexextract(A1, "(\[t- (.*)\]).*(\[t- (.*)\])" )


EDIT: I've received a couple answers with Regex that works for other tools/languages (e.g., PHP or Javascript) but not Google Sheets. Here's the Google Sheets Regex Syntax.

EDIT 2: The above sample string has other text inside brackets marked with other letters, e.g., [c- industry] and [d- it would make]. These should not be included. Only texted in [t- ] (with a "t-") should be returned.


Solution

  • Similar to the technique used here,

    • REGEXREPLACE all [t-.*] to (.*)
    • Provide the resulting expression from above as the regex to REGEXEXTRACT
    • \Q..\E is used to escape other characters

      =REGEXEXTRACT(A1, "\Q"&REGEXREPLACE(A1,"\[t-[^]]+\]","\\E(.*)\\Q")&"\E")
      

    Alternatively, With just REGEXREPLACE to delimit,

    =REGEXREPLACE(A1,"(^|\])(.*?)(\[t-|$)","$1|$3")
    =SPLIT(REGEXREPLACE(A1,"(^|\])(.*?)(\[t-|$)","$1|$3"),"|")
    

    Replace all characters .* that

    • Start with ] or start of string ^
    • End with [t- or end of string $

    with ]|[t-