I am trying to change the delimiter of a large file that is about 4GB. The delimiter is currently "#|#" and I want the delimiter to be "|".
I tried to do a replace and find, but due to the large file my computer does not have enough memory to finish the code. I was wondering if there is a way to read the files line by line instead to save memory.
text = open("C:\\test.txt", "r")
text = ''.join([i for i in text]).replace("#|#", "|")
x = open("C:\\test.txt","w")
x.writelines(text)
x.close()
This is what the file currently looks like:
FIELD #|# FIELD #|# FIELD #|#
and I want it to look like
FIELD | FIELD | FIELD |
Sure you can write line-by-line. In fact in general, file handling is more practical in the more idiomatic way of using the file object as a context manager and an iterator of lines:
import shutil
with open("C:\\test.txt", "r") as long_file, \
open("C:\\test_replaced.tmp", "w") as replacement:
for line in long_file:
replacement.write(line.replace("#|#", "|"))
shutil.move("C:\\test_replaced.tmp", "C:\\test.txt")
This works as long as you can write the temporary file to disk without causing trouble. I do not have a good, succinct solution using the standard library for doing an in-place change to the file, but this should already be much faster and more memory efficient than iterating over the same file twice and reading the whole content into memory.