I have a very large text file and I want to slice multiple specific parts of it, and then create a new text file with only the sliced data. My approach was to find, first the line numbers where the desired part begins and ends, to consequently use them as the ranges for slicing. The reason of this is that the text file contains large parts with also descriptions, annotations, that I need to get rid of. Should I use itertools.islice
?
KMAPspec = open("KMAP_2018_04_23_071018_fast_00001.txt","r")
DataStartLine=[]
DataEndLine=[]
for x, line in enumerate(KMAPspec):
if line.find("#C imageFile")!=-1:
DataStartLine.append(x)
if line.find("#S")!=-1:
DataEndLine.append(x)
with open("output.txt","w") as out:
When the text file is really big, keeping content into a variable is dangerous because it could get you out of memory.
In your case it seems that you could read and write in the same pass. If your #C
and #S
should be excluded from the output:
with open("KMAP_2018_04_23_071018_fast_00001.txt","r") as KMAPspec:
with open("output.txt","w") as out:
should_write = False
for line in KMAPspec:
# When I meet this line, stop writing out
if line.find("#S")!=-1:
should_write = False
# Write out only if between the two tags
if should_write:
out.write(line)
# When I meet this line, start writing out
if line.find("#C imageFile")!=-1:
should_write = True
This way you store nothing in memory.
If the boundary lines should be included:
with open("KMAP_2018_04_23_071018_fast_00001.txt","r") as KMAPspec:
with open("output.txt","w") as out:
should_write = False
for line in KMAPspec:
# When I meet this line, start writing out
if line.find("#C imageFile")!=-1:
should_write = True
# Write out only if between the two tags
if should_write:
out.write(line)
# When I meet this line, stop writing out
if line.find("#S")!=-1:
should_write = False