Search code examples
pythontext-files

Python - Combine text data from group of file by filename


I have below list of text files , I wanted to combine group of files like below

Inv030001.txt - should have all data of files starting with Inv030001

Inv030002.txt - should have all data of files starting with Inv030002

enter image description here

I tried below code but it's not working

filenames = glob(textfile_dir+'*.txt')
for fname in filenames:
filename = fname.split('\\')[-1]
current_invoice_number =  (filename.split('_')[0]).split('.')[0]
prev_invoice_number = current_invoice_number
with open(textfile_dir + current_invoice_number+'.txt', 'w') as outfile:
    for eachfile in fnmatch.filter(os.listdir(textfile_dir), '*[!'+current_invoice_number+'].txt'):
        current_invoice_number = (eachfile.split('_')[0]).split('.')[0]
        if(current_invoice_number == prev_invoice_number):
            with open(textfile_dir+eachfile) as infile:
                for line in infile:
                    outfile.write(line)
            prev_invoice_number = current_invoice_number
        else:
            with open(textfile_dir+eachfile) as infile:
                for line in infile:
                    outfile.write(line)
            prev_invoice_number = current_invoice_number
            #break;

Solution

  • Does this answer your question? My version will append the data from "like" invoice numbers to a .txt file named with just the invoice number. In other words, anything that starts with "Inv030001" will have it's contents appended to "Inv030001.txt". The idea being that you likely don't want to overwrite files and possibly destroy them if your write logic had a mistake.

    I actually recreated your files to test this. I did exactly what I suggested you do. I just treated every part as a separate task and built it up to this, and in doing that the script became far less verbose and convoluted. I labeled all of my comments with task to pound it in that this is just a series of very simple things.

    I also renamed your vars to what they actually are. For instance, filenames aren't filenames, at all. They are entire paths.

    import os
    from glob import glob
    
    #you'll have to change this path to yours
    root  = os.path.join(os.getcwd(), 'texts/')
    
    #sorting this may be redundant 
    paths = sorted(glob(root+'*.txt'))
    
    for path in paths:
        #task: get filename
        filename = path.split('\\')[-1]
        #task: get invoice number
        invnum   = filename.split('_')[0]
        #task: open in and out files
        with open(f'{root}{invnum}.txt', 'a') as out_, open(path, 'r') as in_:
            #task: append in contents to out
            out_.write(in_.read())