Search code examples
regexplsql

Get distinct letters in a string in order of occurance


I am working in PL/SQL and I need to get the distinct letters within a string returned in the order first found. I can do this by iterating over the string multiple times but can I do it in one regex instead?

Here are some examples:

  • 'aa' should return 'a'.
  • 'ab' should return 'ab'.
  • 'aabbcc' should return 'abc'.
  • 'abccba' should return 'abc'.

Solution

  • select regexp_replace('aabbxcc','(.)\1+','\1') from dual
    
    • . refers to any character
    • () places that character in a group.
    • \1 refers to the previous match
    • + means the previous match occurs one or more times

    In other words the regular expression looks for repeated characters where repeated means the same character appearing twice or three times or more in succession, e.g. aa or aaa or aaaa.

    The replace string refers to the character matched by the group which is a single character. Hence the REGEXP_REPLACE function will replace repeated characters with only a single occurrence of that character and non-repeated characters will be kept as they are.

    Refer to this db<>fiddle.

    Of-course the above will not work with your last example of abccba for which you want to get abc. I don't think that can be achieved with regular expressions at all. Since you say you are using PL/SQL, then perhaps repeat the above in a while loop until the REGEXP_REPLACE does not change the source at all.