I fetch name information from a PDF in python with fitz.
Problem is, most of the informations have spaces to match the background, which give me for example : firstname = "P I E R R E
" and lastname "L E D U C D E C O L
".
I need to remove spaces between characters that are not next to an other space.
Of course at first I removed all spaces with "s/\s//g
" but for the name it give me "LEDUCDECOL
" and I need "LE DUC DE COL
".
You could match a single space
, and in a repeating capture group match optional following spaces which will keep the value of the last iteration (a single space) in the capture group.
In the replacement use the group 1 value using \1
( )*
If you want to match a whitespace char, you can replace the space with \s
but note that it can also match a newline:
\s(\s)*
See a regex demo and a Python demo.
For example:
import re
strings = [
"L E D U C D E C O L",
"a b c def g"
]
pattern = r" ( )*"
for s in strings:
print(re.sub(pattern, r"\1", s))
Output
LE DUC DE COL
a b cdefg
If you want to match a single space that is not followed by another space, you can use a negative lookahead, and use an empty string in the replacement:
(?! )
See another regex demo.