Search code examples
pythonpandasnumpyconcatenation

how open folder with multiple dataframes in python and merge into one csv file


how open folder multiple df python in merge all in one csv file

I've around 700 csv files all have exacly the same columns, I need to merge all into one csv file.

that is the data, it is all in one folder, there is a pattern in file name , it is like "date" = ex: 07 25 2018

07252018 = {name: "Carlos", age:"30", height: "15" }

name     age   height
Carlos   30    15



07262018 = {name: "Carlos", age:"30", height: "15" }

name     age   height
Carlos   30    15



and etc.. range of 700csv

what I done..

  • it works, but is very manual, needs alot of typing, since there are 700 csv's

03012018 = pd.read_csv("Data/03012018 )
03022018 = pd.read_csv("Data/03012018 )
03032018 = pd.read_csv("Data/03012018 )
03042018 = pd.read_csv("Data/03012018 )
03052018 = pd.read_csv("Data/03012018 )
and etc..



file = pd.cancat([03012018,03022018,03032018,03042018,03052018 ])

file.to_csv("Data/file")


Expected output will be a optimal way, to do it fast without alot of typing.


Solution

  • IIUC, this should do:

    Option 1:

    Less efficient, more readable:

    def get_df():
        df=pd.DataFrame()
        for file in os.listdir():
            if file.endswith('.csv'):
                aux=pd.read_csv(file)
                df=df.append(aux)
        return df
    

    And then:

    df=get_df()
    

    Option 2:

    More memory efficient, less readable:

    def df_generator():
    
        for file in os.listdir():
            if file.endswith('.csv'):
                aux=pd.read_csv(file)
                yield aux
    

    And then:

    generator=df_generator()
    df = pd.DataFrame()
    for table in generator:
        df = df.append(table)
    

    Note: for this to work as is, the script has to be INSIDE the folder with the csv's. Else, you'll need to add the relative path to that folder from the folder your script will be in.

    Example: If your script is in the folder "Project" and inside that folder you have the folder "Tables" with all your csv's, you do:

    os.listdir('Tables/')