Good evening,
So I have a huge amount of .csvs which I either want to change in one giant csv before reading it with pandas, or directly creating a df with all the .csvs in it. The .csvs all have two columns "timestamp" and "holdings". Now I want to merge them on the "timestamp"-column if they match with each other and create a new column for each "holdings"-column. So far I produced this:
import os
import glob
import pandas as pd
os.chdir("C/USer....")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
dfs = [pd.read_csv(f, index_col=[0], parse_dates=[0])
for f in os.listdir(os.getcwd()) if f.endswith('csv')]
The output is a list with dfs. How do I merge them on "timestamp" column now? I tried to concate
and merge
already, but it always puts them in a single column.
What you are looking for is an outer join between the dataframes. Since the pandas merge
function only operates between two dataframes, we need to loop over each dataframe and merge them individually. We can use the reduce
iterator from functools
to do this cleanly in one line:
import pandas as pd
from functools import reduce
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['timestamp'],
how='outer'), dfs)
Use the suffixes
argument in the merge
function to clean up your column headings.