Search code examples
pythonjoinconcatenation

Python - merge multiple files based on file prefix


Python 2.7

I have multiple files:

file A_01.txt filecontent: aaaa

file A_02.txt filecontent: bbbb

file B_01.txt filecontent: aaaB

file B_02.txt filecontent: bbbB

file D_01.txt filecontentcontent: aaaD

file D_02.txt filecontentcontent: bbbD

I need to create "merged" file based on file prefixes,

for files start with A_0 create merged file merged_A.txt and put there content of all files starts with A_0,

merged_B.txt for files starting with B_

same for all files

# get all files in folder 

files = os.listdir("C:\\MTA\\mta") 

for filename in files:
    #get prefix
   prefix = filename[0:3]

# open destination file to merge individual files into 

   with open(os.path.join("C:\\MTA\mta", "merged" + "_" + prefix + ".txt"), 'w') as outfile:
       # go through all files and merge it into outfile
       for file in files:
           with open(os.path.join("C:\\MTA\mta", filename)) as infile:
             outfile.write(infile.read())
           outfile.write("--------------\n")

Above code, generates merged files but, both merged files contain content of all files

files = os.listdir("C:\\MTA\\mta") 

for filename in files:
    #get prefix
   prefix = filename[0:3]

# open destination file to merge individual files into 

   with open(os.path.join("C:\\MTA\mta", prefix + "file.siem"), 'w') as outfile:
       # go through all files and merge it into outfile
       #for filename in files:
           with open(os.path.join("C:\\MTA\mta", filename)) as infile:
             outfile.write(infile.read())
           outfile.write("--------------\n")

This version writes only content of one file into merged file


Solution

  • You're writing a new file every time you read a file, you need to append instead. You also have an unnecessary nested for-loop to read the file, while you could read them in the outer loop. This should work:

    import os
    
    # get all files in folder 
    
    files = os.listdir("C:\\MTA\\mta") 
    
    for filename in files:
        #get prefix
        prefix = filename[0:2]
    
    
    # open destination file to merge individual files into 
    
        with open(os.path.join("C:\\MTA\\mta", "merged" + "_" + prefix + ".txt"), 'a') as outfile:
           # go through all files and merge it into outfile
            with open(os.path.join("C:\\MTA\\mta", filename)) as infile:
                outfile.write(infile.read())
            outfile.write("--------------\n")