Search code examples
regexpcre

How can I match a repeating pattern within a repeating pattern without specifying the alternatives twice?


Suppose I have the following string, representing one or many days or day ranges:

mon,thu..fri,sun

How can I match any arbitrary list of ranges or single days with a regular expression, without expanding the day alternatives twice?

I currently have this:

(?P<weekdays>
    (                        
        \b
        (mon|tue|wed|thu|fri|sat|sun)
        (\.\.(mon|tue|wed|thu|fri|sat|sun))?
        ,?
    )*
)

... this works, but it forces me to repeat the day alternatives in the regex (which are simplified here but are longer!). Note that this regex matches for fri,sat, thus optionally ending in a comma, this IS the desired behavior.

I also tried making the range portion a limited repetition using {1,2}, but I am unable to avoid matching the invalid mon..tue..fri because the pattern restarts via the optional comma.

Note that this is part of a longer regex so I can't use the global flag.

This is the Regex101 URL, where I also added some unit tests.

Small edit: used the \b metacharacter instead of a negative lookahead.


Solution

  • You can use PCRE named group and reuse a sub-pattern later using (?&groupName) construct:

    ^(?<weekdays>
        (                        
            \b
            (?<weeks>mon|tue|wed|thu|fri|sat|sun)
            (?:\.\.(?&weeks))?
            ,?
        )+
    )$
    

    RegEx Demo


    To keep definition separate from reference, use DEFINE directive of PCRE:

    (?(DEFINE)
       (?<weeks>mon|tue|wed|thu|fri|sat|sun)
    )
    ^(?P<weekdays>
        (?:                        
            \b
            (?&weeks)
            (?:\.\.(?&weeks))?
            ,?
        )*
    )$
    

    RegEx Demo 2