Search code examples
pythonpandascsvmergeconcatenation

Merge several .csv into one csv in python


Good evening,

So I have a huge amount of .csvs which I either want to change in one giant csv before reading it with pandas, or directly creating a df with all the .csvs in it. The .csvs all have two columns "timestamp" and "holdings". Now I want to merge them on the "timestamp"-column if they match with each other and create a new column for each "holdings"-column. So far I produced this:

import os
import glob
import pandas as pd

os.chdir("C/USer....")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

dfs = [pd.read_csv(f, index_col=[0], parse_dates=[0])
        for f in os.listdir(os.getcwd()) if f.endswith('csv')]

The output is a list with dfs. How do I merge them on "timestamp" column now? I tried to concate and merge already, but it always puts them in a single column.


Solution

  • What you are looking for is an outer join between the dataframes. Since the pandas merge function only operates between two dataframes, we need to loop over each dataframe and merge them individually. We can use the reduce iterator from functools to do this cleanly in one line:

    import pandas as pd
    from functools import reduce
    
    df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['timestamp'],
                                            how='outer'), dfs)
    

    Use the suffixes argument in the merge function to clean up your column headings.