Search code examples
regexmatlabparsingstring-parsing

Remove duplicated characters using regex


How would you remove duplicated characters in regex while some characters are meant to be repeated?

For example, I have "BBAALLLLOOOONN" and I want the output to just be BALLOON.

I have tried this regex: /(.)(?=\1)/g but the result would be "BALON" instead of "BALLOON".


Solution

  • Use

    regexprep(line, '([A-Za-z])\1', '$1')
    

    See proof

    () is a capturing group referenced to with \1, \1 consumes the duplicate char and only the captured letter is returned for each match as the replacement pattern is the $1 backreference.