Search code examples
regexgoogle-sheetsconcatenationgoogle-sheets-formulatextjoin

REGEX to parse words from cell in googlesheets


Hoping somebody can help me with the syntax.

Cell A1 contains the following value:

Content Type: Pitch Decks; Region: North America; Content Type: Data Sheet;

I'm trying to create a formula that parses the values between the words "Content Type: " and the semicolon that follows.

So in B1, the expected result of the formula should be:

Pitch Decks
Data Sheet

(Yes, with char(10) separating the values in B1)

I've tried a number of different combinations of REGEXEXTRACT, for example:

REGEXEXTRACT(A11,"Content Type: ([\w\/\s]*);?")

..only returns:

Pitch Decks

I've read that global modifiers are not supported in Google Sheets. That formula works when I add a global modifier in a regex test harness, but not when used within Google Sheets itself.


Solution

  • I thought this would be easy with substitution and a Lookahead but turns out Google Sheets doesn't support Lookaheads either. After some research, I came up with the following solution:

    =join(CHAR(10),REGEXEXTRACT(A1,REGEXREPLACE(A1,"(Content Type: )([^;]+)","$1($2)")))
    

    This will find all the strings that come after "Content Type: " and until the next semicolon or the end of the string. Then, it will join them using a LineFeed. If you want to join using a different character, you may replace CHAR(10) with ",", for example, to join with a comma.

    Output:

    Pitch Decks
    Data Sheet