Search code examples
pythontextnlp

How to get rid of lines surrounded by empty spaces?


I have a text that looks something like this:

text
text
text


to remove

text
text
text


to remove
 

text
text
text

There are blocks of uninterupted text and I need to remove the lines that look like 'to remove' in the above example, there are 2 empty lines above them and 1 empty line below. Is there someway to programmatically remove those lines together with spaces that surround them in Python?


Solution

  • This should work:

    l=[]
    with open('yourfile.txt') as f:
        for i in f:
            l.append(i)
    
    m=set()
    for i in range(len(l)):
        if l[i].replace(' ', '')=='\n':
            m.add(i)
    for i in range(1, len(l)-1):
        if l[i-1].replace(' ', '')=='\n' and l[i+1].replace(' ', '')=='\n':
            m.add(i)
    
    result=[l[i] for i in range(len(l)) if i not in m]
    
    with open('yourfile.txt', 'w') as f:
        for i in result:
            f.write(i)