How to use `preg_replace` to remove repeated chars with spaces around

I have multiple strings and I need to remove repeated chars. For example: the string here abbbbbc x should become here abc x or the string test jjka should become test jka.

After studying, I came up with the code below which works fine (it uses PHP but you can use any language):

echo preg_replace("/([a-z])\\1+/","$1","test ajjjo new");

The code above will output test ajo new which is great!

My problem now, is that I need to only replace the repeated chars if they are inside a word or at the beggining of end of the word. For example: I need the string here bbb cca to become here bbb ca and the string test hjjjja ppp to become test hja ppp. I tried negating the (space) and ^ and $ but it all becomes a mess pretty fast.

How would you recommend me?

Solution

Simpler solution, as I thought there ought to be (making use of the "best regex trick ever" (https://www.rexegg.com/regex-best-trick.html):

\b(?<whole_word>[a-z])\k{whole_word}++\b(*SKIP)(*FAIL)|(?<not_whole_word>[a-z])\k{not_whole_word}++

which is the exact same (but less compact than what @Wiktor Stribiżew commented):

\b([a-z])\1+\b(*SKIP)(*F)|([a-z])\2+

and replace with:

$not_whole_word

See: https://regex101.com/r/pa0GjG/1

Explaination:

\b if you find a whole word, ie. a word boundary
(?<whole_word>[a-z])\k{whole_word}++ followed by a character that makes up the whole word until the
\b end of the word
(*SKIP)(*FAIL) then not match
- | in every other case
(?<not_whole_word>[a-z]) match a character that is
\k{not_whole_word}++ repeated

OLD IDEA

You could use:

(?:(\b)|\B)(?!\k{char})(?<anything>.)(?<char>[a-z])\k{char}++(?(1)\B)

and replace with

$anything$char

See: https://regex101.com/r/yCNKY1/1

I guess there is a more obvious answer but this should work also.

(?:(\b)|\B) check, whether you are at the beginning of a word or not. If so group 1 will be set.
- (?!\k{char}) check that the character of interest is not preceeded by itself
- (?<anything>.) i.e. it must be preceeded by anything other
  - (?<char>[a-z]) match the character
  - \k{char}++ match all number of repetitions and do not give them up
(?(1)\B) ensure, that if the start of the match was the start of a word, you are now not at the end -> you cannot match a complete word.