Search code examples
pythonoverwrite

Searching text in a file, replacing text and writing to new file in subdir, getting a doubling of replacement text when iterating


When I search text in a single file and write out just that one file, it acts as expected. It creates a new file in the subdirectory "output", with the existing text "This", and the addition of the text "And That" on the next line. However, when I am iterating through all the files in a sub-directory, I'm getting double the new text. I don't get why. Here is the code:

import os
import shutil
import pathlib


def replace_text_in_multiple_files(input_path, output_path):
    search_text = "This"
    new_text = "This\nAndThat"

    shutil.rmtree(output_path)
    os.mkdir(output_path)

    for subdir, dirs, files in os.walk(input_path):
        for file in files:
            input_file_path = subdir + os.sep + file
            output_file_path = output_path + os.sep + file
            if input_file_path.endswith(".txt"):
                s = pathlib.Path(input_file_path).read_text()
                s = s.replace(search_text, new_text)

                with open(output_file_path, "w") as f:
                    f.write(s)

def replace_text_in_a_single_files(input_file_path, output_file_path):
    search_text = "This"
    new_text = "This\nAndThat"

    s = pathlib.Path(input_file_path).read_text()
    s = s.replace(search_text, new_text)

    with open(output_file_path, "w") as f:
        f.write(s)

replace_text_in_multiple_files("D:\\Test\\", "D:\\Test\\output\\")
#replace_text_in_a_single_files("D:\\Test\\File1.txt", "D:\\Test\\output\\File1.txt")

In the directory 'D:\Test' I have 3 text files. Each of the text files contains the following text:

This
is 
a
test

If I run 'replace_text_in_a_single_files' in the code, it opens File1.txt, searches for the text, replaces that text with the same text plus the value 'And That', and then writes that out to a new file in the output subdirectory, which results in the following:

This
And That
is 
a
test

However, when I run replace_text_in_multiple_files which does the same thing, just to a bunch of files instead of just one, each of the new files gets a doubling of the replacement text, resulting in the following:

This
AndThat
AndThat
is 
a
test

So, it's like it's executing the replacement code twice. But why? And why only when it's iterating?

I was expecting that it would just produce the following text in each of the files.

This
AndThat
is 
a
test

Solution

  • You're iterating over the input files as well as your own output files:

    import os
    import shutil
    import pathlib
    
    
    def replace_text_in_multiple_files(input_path, output_path):
        search_text = "This"
        new_text = "This\nAndThat"
    
        shutil.rmtree(output_path)
        os.mkdir(output_path)
    
        for subdir, dirs, files in os.walk(input_path):
            for file in files:
                print(subdir, file)
                input_file_path = subdir + os.sep + file
                output_file_path = output_path + os.sep + file
                if input_file_path.endswith(".txt"):
                    s = pathlib.Path(input_file_path).read_text()
                    s = s.replace(search_text, new_text)
    
                    with open(output_file_path, "w") as f:
                        f.write(s)
    
    def replace_text_in_a_single_files(input_file_path, output_file_path):
        search_text = "This"
        new_text = "This\nAndThat"
    
        s = pathlib.Path(input_file_path).read_text()
        s = s.replace(search_text, new_text)
    
        with open(output_file_path, "w") as f:
            f.write(s)
    
    replace_text_in_multiple_files("./Test", "./Test/output/")
    
    ./Test File3.txt
    ./Test File2.txt
    ./Test File1.txt
    ./Test/output File3.txt
    ./Test/output File2.txt
    ./Test/output File1.txt
    

    Your script writes each output file once it "sees" a file in the input folder, but then os.walk "discovers" files with the same name in the output folder and proceeds to iterate over those.