Search code examples
pythonpython-re

Word pattern matching and substitute using re module


I have a text list with grammatic problem:

List1 = ['He had no hungeror sleepelse he would have told']

I want to get the full sentence but corrected, i.e.

['He had no hunger or sleep else he would have told']

I have created a conjunction list:

conj = ['else', 'or']

I am able to identify the line that contains the conjunction word else but not how to replace the word or remove else and then append else with a space between the two words sleep and else.

for line in List1:
    line = line.rstrip()
    if re.search('[A-Za-z]*else',line):
        print(line)

Please guide me how to do this.


Solution

  • There's two ways to do this; the right way and the wrong way. Since your question tagged 're', I'll assume you want the wrong way:

    >>> line = 'He had no hungeror sleepelse he would have told'
    >>> conj = ['else', 'or']
    >>> pattern = r"(?<=\S)(%s)(?=\s|$)" % ("|".join(conj))
    >>> re.sub(pattern, r' \1', line)
    'He had no hunger or sleep else he would have told'
    

    The right way would be to split the string, loop through each word, and each conj, and if the word ends in the conj, add the two separate words to the original string, and if it doesn't add the word on it's own.