Search code examples
regexgoogle-sheetsre2

Multiple regex matches in Google Sheets formula


I'm trying to get the list of all digits preceding a hyphen in a given string (let's say in cell A1), using a Google Sheets regex formula :

=REGEXEXTRACT(A1, "\d-")

My problem is that it only returns the first match... how can I get all matches?

Example text:

"A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq"

My formula returns 1-, whereas I want to get 1-2-2-2-2-2-2-2-2-2-3-3- (either as an array or concatenated text).

I know I could use a script or another function (like SPLIT) to achieve the desired result, but what I really want to know is how I could get a re2 regular expression to return such multiple matches in a "REGEX.*" Google Sheets formula. Something like the "global - Don't return after first match" option on regex101.com

I've also tried removing the undesired text with REGEXREPLACE, with no success either (I couldn't get rid of other digits not preceding a hyphen).

Any help appreciated! Thanks :)


Solution

  • The shortest possible regex:

    =regexreplace(A1,".?(\d-)|.", "$1")
    

    Which returns 1-2-2-2-2-2-2-2-2-2-3-3- for "A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq".

    Explanation of regex:

    • .? -- optional character
    • (\d-) -- capture group 1 with a digit followed by a dash (specify (\d+-) multiple digits)
    • | -- logical or
    • . -- any character
    • the replacement "$1" uses just the capture group 1, and discards anything else

    Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex