Merge several .csv into one csv in python

Good evening,

So I have a huge amount of .csvs which I either want to change in one giant csv before reading it with pandas, or directly creating a df with all the .csvs in it. The .csvs all have two columns "timestamp" and "holdings". Now I want to merge them on the "timestamp"-column if they match with each other and create a new column for each "holdings"-column. So far I produced this:

import os
import glob
import pandas as pd

os.chdir("C/USer....")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

dfs = [pd.read_csv(f, index_col=[0], parse_dates=[0])
        for f in os.listdir(os.getcwd()) if f.endswith('csv')]

The output is a list with dfs. How do I merge them on "timestamp" column now? I tried to concate and merge already, but it always puts them in a single column.

Solution

What you are looking for is an outer join between the dataframes. Since the pandas merge function only operates between two dataframes, we need to loop over each dataframe and merge them individually. We can use the reduce iterator from functools to do this cleanly in one line:

import pandas as pd
from functools import reduce

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['timestamp'],
                                        how='outer'), dfs)

Use the suffixes argument in the merge function to clean up your column headings.