Search code examples
regexpcre

Match global, but only if line starts with a specific string


I feel like this should be very simple, and I'm missing a single, important bit.

Example: https://regex101.com/r/lXh5Vj/1

Regex, using /m/g flags:

^GROUPS.*?"(?<name>[^:]+):(?<id>\d+)"

Test string:

GROUPS: ["group1:44343", "group2:23324", "group3:66567"]
USERS: ["user1:44343", "user2:23324", "user3:66567"]

My current regex will only match group1, because only that group is directly preceded by "GROUPS". I interpret this as "Global matching" meaning it will only start to check the string again after the first match. As there is no "GROUPS" between group1 and group2, group2 is not a match. If I alter the test string and add "GROUPS" before group2, this will also match, supporting my suspicion. But I do not know how to alter global matching handling to always consider the start of the line GROUPS.

The Regex should match 3 and 3 in the first line, and none in the second. If I remove the "GROUPS" part from the regex, the groups are matched just fine, but then also match the second line, which I do not want.


Solution

  • If you want to match GROUPS: [" at the start of the string, and the key:value parts in named groups, you can make use of the \G anchor.

    (?:^GROUPS:\h*\["(?=[^][]*])|\G(?!^),\h*")(?<name>[^:]+):(?<id>\d+)"
    
    • (?: Non capture group
      • ^GROUPS:\h*\[ Start of string, Match GROUPS: optional spaces and [
      • "(?=[^][]*]) Match " and assert a closing ] at the right
      • | Or
      • \G(?!^),\h*" Assert the position at the end of the previous match (to get consecutive groups) and match a comma, optional spaces and "
    • ) Close non capture group
    • (?<name>[^:]+) Named group name Match 1+ times any char except :
    • : Match literally
    • (?<id>\d+) Named group id, match 1+ digits
    • " Match literally

    Regex demo