Search code examples
regexmultiline

Regex that only matches when no duplicate lines are found


I have a multiline string like this:

SA21 abcdef
BKxyz
SA21 abcdef

I need a regex that only matches if the line ^SA21 abcdef$ is present once. So it should not match for the first example but it should match for this one:

BK udsia
SA21 abcdef
BKxyz

I tried to capture the line and make sure it matches only when the same line is not found later: /(^SA21 abcdef$)(?!\1)/m regex101 but that does not work as it will probably always match the last line...


Solution

  • The regex you want should only match a line if the line is not present before or after the single occurrence of the line. This is achieved with a tempered greedy token:

    /\A(?:(?!^SA21 abcdef$).)*(^SA21 abcdef$)(?:(?!^SA21 abcdef$).)*\z/ms
    

    See the regex demo

    The (?:(?!^SA21 abcdef$).)* is the token matching any text but the beginning of the SA21 abcdef line. The /s modifier is required so that a . could match a newline.

    However, the construct is resource consuming, and it is a good idea to unroll it:

    /\A(?:\n+(?!SA21 abcdef$).*)*\n*^(SA21 abcdef)$(?:\n+(?!SA21 abcdef$).*)*\z/m
    

    See another demo

    Note that \A and \z are unambiguous start/end string anchors, the /m modifier does not affect them.

    Pattern explanation:

    • \A - start of string
    • (?:\n+(?!SA21 abcdef$).*)* - zero or more sequences of:
      • \n+ - 1 or more newlines ...
      • (?!SA21 abcdef$) - not followed with SA21 abcdef that is the whole line
      • .* - zero or more chars other than a newline
    • \n* - zero or more newlines
    • ^ - start of a line
    • (SA21 abcdef) - the line that must be single
    • $ - end of line
    • (?:\n+(?!SA21 abcdef$).*)* - see above
    • \z - end of string.