Search code examples
python-3.xdataframemergeconcatenation

Concatenate two data frames with the same name from different directories via Python


I have two folders in different directories that contain excel files with the same name. I need to concatenate the excel files, not sheets, because each file has one sheet, that have the same name. for example:

d1 contains A, B, C files

d2 contains A, B, E, F file

The output should contain two files, the result from concatenating A from d1 and A from d2, and the result from concatenating B from d1 and B from d2.

If d1 and d2 have no files with the same name, there will be no output.

the first data frame has multiple rows but the second data frame has one row. When I run this code, it considers the second data frame as a header, and when I set the header as false it no longer takes it under consideration.

import glob
import numpy as np

file_d1 = glob.glob(d1 + "/*.xlsx")
file_d2 = glob.glob(d2 + "/*.xlsx")
i=0
for file_d1 in glob.glob(d1 + "/*.xlsx"):
    fileName_d1 = os.path.splitext(os.path.splitext(os.path.basename(file_d1))[0])[0]
    for file_d2 in glob.glob(d2 + "/*.xlsx"):
        fileName_d2 = os.path.splitext(os.path.splitext(os.path.basename(file_d2))[0])[0]
        if fileName_d1 == fileName_d2:
            i+=1
            fileName_1=d1+'/'+fileName_d1+'.xlsx'
        df1 = pd.read_excel(fileName_1, header=0, index= False)
        fileName_2=d2+'/'+fileName_d2+'.xlsx'
        df2 = pd.read_excel(fileName_2, header=0, index= False)
        print(fileName_1)
        print(fileName_2)
        df = pd.DataFrame(np.concatenate([df1.values, df2.values], axis=0), columns=df2.columns)
df.to_excel('C:/Users/khouloud.ayari/Desktop/FinalResult/output'+str(i)+'.xlsx', index = True, header=False)

What should I do to get the correct output, which is (1st data frame + 2nd data frame) not the other way around?

when I concatenated the two files, the "nbr de Reboot" was gone and the "nbr de km parcourus" was considered as a header df.to_excel('C:/Users/khouloud.ayari/Desktop/FinalResult/output'+str(i)+'.xlsx', index = False, header=True)

output via console

1   nbr de Kilomètres parcourus  1
0       Passage en mode privé  1
1      Passage en mode public  2

output .xlsx

enter image description here

expected output: enter image description here A and B are three different files

PS: I'm using Python 3.7(Spyder)


Solution

  • My goal was to concatenate two excel files with the same name located in different directories but I failed to do so because some data was lost in the process, so I added headers to my excel files and I run the code below:

    import os 
    import pandas as pd
    
    d2 = "C:/Users/khouloud/Desktop/d2"
    d1 = "C:/Users/khouloud/Desktop/d1"
    
    import glob
    import numpy as np
    file_d1 = glob.glob(d1 + "/*.xlsx")
    file_d2 = glob.glob(d2 + "/*.xlsx")
    i=0
    for file_d1 in glob.glob(d1 + "/*.xlsx"):
        fileName_d1 = os.path.splitext(os.path.splitext(os.path.basename(file_d1))[0])[0]
        for file_d2 in glob.glob(d2 + "/*.xlsx"):
            fileName_d2 = os.path.splitext(os.path.splitext(os.path.basename(file_d2))[0])[0]
            if fileName_d1 == fileName_d2:
                i+=1
                fileName_1=d1+'/'+fileName_d1+'.xlsx'
                df1 = pd.read_excel(fileName_1, header=0, index= False)
                fileName_2=d2+'/'+fileName_d2+'.xlsx'
                df2 = pd.read_excel(fileName_2, header=0, index= False)
                df = pd.DataFrame(np.concatenate([df1.values, df2.values]), columns=df2.columns)
                print('**********************************************')
                print('1',df)
                print('**********************************************')
                df.to_excel('C:/Users/khouloud/Desktop/FinalResult/output'+str(i)+'.xlsx', index = False, header=False)