Search code examples

Regex to remove not repeated spaces

I fetch name information from a PDF in python with fitz.

Problem is, most of the informations have spaces to match the background, which give me for example : firstname = "P I E R R E" and lastname "L E D U C D E C O L".

I need to remove spaces between characters that are not next to an other space.

Of course at first I removed all spaces with "s/\s//g" but for the name it give me "LEDUCDECOL" and I need "LE DUC DE COL".


  • You could match a single space , and in a repeating capture group match optional following spaces which will keep the value of the last iteration (a single space) in the capture group.

    In the replacement use the group 1 value using \1

     ( )*

    If you want to match a whitespace char, you can replace the space with \s but note that it can also match a newline:


    See a regex demo and a Python demo.

    For example:

    import re
    strings = [
        "L E  D U C  D E  C O L",
        "a        b     c def g"
    pattern = r" ( )*"
    for s in strings:
        print(re.sub(pattern, r"\1", s))


    a b cdefg

    If you want to match a single space that is not followed by another space, you can use a negative lookahead, and use an empty string in the replacement:

     (?! )

    See another regex demo.