Search code examples
regex-grouplibreoffice-calc

Is there an alternative to REGEX function not exposing capturing groups?


Let's say I've strings like:

abcd efgh x23y mnop

which I want to split it in 3 parts:

  • All groups before the group containing figures
  • The group containing figures
  • The rest of the string

I can do this with this regex (online tester):

^([a-zA-Z\s]+)\s+([a-zA-Z0-9]+)\s+(.*?)\s*$

Because I have a large number of these strings in a file, I tried an adventurous experience with LibreOffice Calc and its function REGEX. The last parameter of the function is Occurrence the Number to indicate which match of Expression in Text is to be extracted or replaced.

At first I understood this parameter would give me the content of the related capturing group, but rather returns the sequence matched by the full pattern for parameter value 1, and if if matched more than once, the other parameter values would return the related sequences.

So I'm left with no formula alternative for my problem. I could use a convoluted solution which calls the text search function, but in that case I would prefer to develop a small c# program to process the file.

I've tried also SEARCH but groups are not exposed either, I think the only time groups are accessible is when using the text find/replace dialog. Is there any Calc solution not relying on a macro?


Solution

  • Groups are used in the Replacement parameter. Optional. The replacement text and references to capture groups. If there is no match, Text is returned unmodified.

    =REGEX(A1;"^([a-zA-Z\s]+)\s+([a-zA-Z0-9]+)\s+(.*?)\s*$";"All groups before the group containing figures: $1" & CHAR(10) & "The group containing figures: $2" & CHAR(10) & "The rest of the string: $3")
    

    Result:

    All groups before the group containing figures: abcd efgh
    The group containing figures: x23y
    The rest of the string: mnop