Search code examples
regexgore2

repeated, arbitrary capture groups


Given a string, eg.:

static_string.name__john.id__6.foo__bar.final_string

but with an arbitrary number of label__value. components, how can I repeat the capture groups, split them into label & value, and also capture the terminating final_string ?

For the above I'd want [name, john, id, 6, foo, bar, final_string]

Is something like this possible when I don't know the number of label__value. components in advance?

This is for golang / RE2 if that matters.

Update: I don't have the luxury of doing this in a few lines of code, and would need to do this in a single regex. The regex is defined in a config file to an application I don't control, so a code based loop with conditionals etc is unfortunately not possible.


Solution

  • This totally depends on what the thing you are putting this into expects.

    This is answer focused on getting you the capture groups in a basic way attempting to avoid any issues with the "thing" you are putting the regex into and RE2.

    Note: You might find that the final_string doesn't get the capture group index you expect with this method, but again depends on what you are putting the regex into.

    A regular expression that would match "one" and "no" key/value pairs the following is:

    ^[^.]+(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+))$
    
    • static_string.final_string
    • static_string.name__john.final_string

    To support one more key/value pair we repeat part of the regular expression: Part repeated:

    (?:\.([^.]+?)__([^.]+))?
    

    So to support 2 key value pairs the regular expression is:

    ^[^.]+(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+))$
    

    This now supports the following additional example:

    • static_string.name__john.foo__bar.final_string

    So if I expand that out to support 12 key value pairs the regular expression is:

    ^[^.]+(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+))$
    

    This supports the following additional examples:

    • static_string.name__john.id__6.foo__bar.final_string
    • static_string.name2_1b__john.id__6.foo__bar.final_string
    • static_string.name__john.id__6.foo__bar.name__john.id__6.foo__bar.name__john.id__6.foo__bar.name__john.id__6.foo__bar.final_string