Search code examples
pythonlarge-files

How to delete/remove a certain set of lines which matches the text from a huge txt file


I have a huge .txt file which looks like this

After every 100 lines the following block of lines repeat :

ITEM: TIMESTEP 

1000100

ITEM: NUMBER OF ATOMS

100

ITEM: BOX BOUNDS pp pp pp

-5.63124 5.63124

-5.63124 5.63124

-5.63124 5.63124

ITEM: ATOMS id mol type xu yu zu vx vy vz

and the above block of text appears around 10000 times. How do i get rid of these line specifically?


Solution

  • You can check for the starting word ITEM: TIMESTEP\n and then skip the 8 lines.

    with open('samp.txt') as f:
        line = f.readline()
        while(line != ''):
            if line != '''ITEM: TIMESTEP\n''':
                print(line.strip())
            else:
                #skip 8 lines
                for i in range(8): f.readline()
            
            line = f.readline() 
    

    output

    17 1 1 -2.20243 -5.29512 -4.4049 -1.7509 -0.678094 -2.92041
    21 1 1 -0.574106 -4.73233 -5.02726 0.630247 -1.43315 0.144725
    50 1 1 -6.78421 -4.33292 -5.62459 2.38831 0.400303 -2.2132
    27 1 1 -2.43637 -3.6223 -5.19709 1.75747 0.293975 0.56135
    26 1 1 -2.28676 -3.00667 -4.51059 1.85878 -2.28114 2.43501
    ...