Search code examples
pythonregexlistdata-cleaning

Taking out the last letter of some items in a list


I have a list like the following:

['Singh Sumer',
 'Li Sheng\n',
 'Hahn Vanessa',
 'Ruiter Dana',
 'Kleinbauer Thomas',
 'Klakow Dietrich\n',
 'Caselli Tommaso']

Some members have '\n' at the end. I want to remove them and put them as following members of the list.

So, I want to have this as an output:

['Singh Sumer',
 'Li Sheng',
 '\n',
 'Hahn Vanessa'
 'Ruiter Dana',
 'Kleinbauer Thomas',
 'Klakow Dietrich',
 '\n',
 'Caselli Tommaso']

I tried to get the indexes of the members who have '/n' at the end and insert '/n' to those indexes. But when I used the insert function, it replaced them with the other members of the list . Any suggestions?


Solution

  • re.split with lookahead returns a list with the word and newline if present, or just the word

    import re
    
    lst = [
     'Singh Sumer',
     'Li Sheng\n',
     'Hahn Vanessa',
     'Ruiter Dana',
     'Kleinbauer Thomas',
     'Klakow Dietrich\n',
     'Caselli Tommaso'
    ]
    r = []
    for s in lst:
        r.extend(re.split(r'(?=\n)', s))
    print(r)