Search code examples
pythondelimiterlarge-files

How to find and replace a string in python with a large data set


I am trying to change the delimiter of a large file that is about 4GB. The delimiter is currently "#|#" and I want the delimiter to be "|".

I tried to do a replace and find, but due to the large file my computer does not have enough memory to finish the code. I was wondering if there is a way to read the files line by line instead to save memory.

text = open("C:\\test.txt", "r")
text = ''.join([i for i in text]).replace("#|#", "|")
x = open("C:\\test.txt","w")
x.writelines(text)
x.close()

This is what the file currently looks like:

FIELD #|# FIELD #|# FIELD #|#

and I want it to look like

FIELD | FIELD | FIELD |


Solution

  • Sure you can write line-by-line. In fact in general, file handling is more practical in the more idiomatic way of using the file object as a context manager and an iterator of lines:

    import shutil
    
    with open("C:\\test.txt", "r") as long_file, \
         open("C:\\test_replaced.tmp", "w") as replacement:
        for line in long_file:
            replacement.write(line.replace("#|#", "|"))
    
    shutil.move("C:\\test_replaced.tmp", "C:\\test.txt")
    

    This works as long as you can write the temporary file to disk without causing trouble. I do not have a good, succinct solution using the standard library for doing an in-place change to the file, but this should already be much faster and more memory efficient than iterating over the same file twice and reading the whole content into memory.