Search code examples
regexregex-lookarounds

Finding unique characters in words with RegEx


I want to use RegEx to match characters in a sequence that only appear one time within that word. So for example in armored armadillo the only matches would be e, i and .

My first attempt was to use lookaround to match characters such that no copy of the same character appeared before or after the match:

(.)(?<!^.*\1)(?!\1.*$)

But this appears to match no characters no matter what I do. What am I doing wrong? How can I match characters in the way that I want to?


Solution

  • You can use

    (.)(?<!\1.+)(?!.*\1)
    

    The regex captures a char into Group 1 and then makes sure there is no such char before and after with the two lookarounds.

    To match any chars including line breaks replace . with [\s\S] or prepend the pattern with (?s) inline modifier ((?m) in Ruby).

    Details

    • (.) - Group 1: any single char (other than line break char by default)
    • (?<!\1.+) - a negative lookbehind that fails the match if there is Group 1 value followed by one or more chars other than line break chars as many as possible immediately to the left of the current position (+ is required here to make sure . matches at least one char, the one that was captured into Group 1)
    • (?!.*\1) - a negative lookahead that fails the match if there are zero or more chars other than line break chars as many as possible and then Group 1 value immediately to the right of the current position.

    See the regex demo.