Regex to remove not repeated spaces

I fetch name information from a PDF in python with fitz.

Problem is, most of the informations have spaces to match the background, which give me for example : firstname = "P I E R R E" and lastname "L E D U C D E C O L".

I need to remove spaces between characters that are not next to an other space.

Of course at first I removed all spaces with "s/\s//g" but for the name it give me "LEDUCDECOL" and I need "LE DUC DE COL".

Solution

You could match a single space , and in a repeating capture group match optional following spaces which will keep the value of the last iteration (a single space) in the capture group.

In the replacement use the group 1 value using \1

 ( )*

If you want to match a whitespace char, you can replace the space with \s but note that it can also match a newline:

\s(\s)*

See a regex demo and a Python demo.

For example:

import re
 
strings = [
    "L E  D U C  D E  C O L",
    "a        b     c def g"
]
pattern = r" ( )*"
for s in strings:
    print(re.sub(pattern, r"\1", s))

Output

LE DUC DE COL
a b cdefg

If you want to match a single space that is not followed by another space, you can use a negative lookahead, and use an empty string in the replacement:

 (?! )

See another regex demo.