Search code examples
regexpcresemgrep

regex matching duplicates in a comma separated list


I'm trying to regex match any duplicate words (i.e. alphanumeric and can have dashes) in some yaml with a PCRE tool.

I have found a consecutive, duplicate regex matcher:

(?<=,|^)([^,]*)(,\1)+(?=,|$)

it will catch:

hello-world,hello-world,goodbye-world,goodbye-world

but not the hello-worlds in

hello-world,goodbye-world,goodbye-world,hello-world

Could someone help me try to build a regex pattern for the second case (or both cases)?


Solution

  • You may use this regex:

    (?<=,|^)([^,]+)(?=(?>,[^,]*)*,\1(?>,|$)),
    

    RegEx Demo

    RegEx Details:

    • (?<=^|,): Assert that we have , or start position before current position
    • ([^,]+): Match 1+ of non-comma text and capture in group #1
    • (?=(?>,[^,]*)*,\1(?>,|$)): Lookahead to assert presence of same value we captured in group #1 ahead of us
    • ,: Match ,