Concatenate all files that map values in the same key

I have a dictionnary that group different pattern :

dico_cluster={'cluster_1': ['CUX2', 'CUX1'], 'cluster_2': ['RFX3', 'RFX2'],'cluster_3': ['REST']}

Then I have files in a folder :

"/path/to/test/files/CUX1.txt"
"/path/to/test/files/CUX2.txt"
"/path/to/test/files/RFX3.txt"
"/path/to/test/files/RFX2.txt"
"/path/to/test/files/REST.txt"
"/path/to/test/files/ZEB.txt"
"/path/to/test/files/TEST.txt"

I'm trying to concatenate the files that are in the same cluster. The output file name should be the name of pattern join by underscore "_"

I tried this :

filenames = glob.glob('/path/to/test/files/*.txt')

for clee in dico_cluster.keys():
    fname='_'.join(dico_cluster[clee])
    outfilename ='/path/to/test/outfiles/'+ fname + ".txt"
    for file in filenames:
        tf_file=file.split('/')[-1].split('.')[0]
        if tf_file in dico_cluster[clee]:
            with open(outfilename, 'wb') as outfile:
                for filename in filenames:
                    if filename == outfilename:
            # don't want to copy the output into the output
                        continue
                    with open(filename, 'rb') as readfile:
                        shutil.copyfileobj(readfile, outfile)

But it's not working. I'm just concatenating all the files. I want to cat the file that are in the same cluster.

Solution

I would recommend to use os package, it's easier to use.

If I understood your problem I would try to do this by loading the whole content of your files before writing it.

import os
for clee in dico_cluster.keys():
        my_clusters =list(set(dico_cluster[clee]))
        fname = "_".join(my_clusters)
        data = list()
        outfilename = os.path.join("/path/to/test/outfiles", fname + ".txt")
        for file in filenames:
            tmp_dict = dict()
            tf_file = os.path.basename(file).split(".")[0]
            if tf_file in my_clusters:
                with open(file, 'rb') as f1:
                    data.extend([elm for elm in f1.readlines()])

        with open(outfilename, "wb") as _output_file:
            for elm in data:
                _output_file.write(elm)