I am trying to combine multiple csv files for ETF data. These csv files have the following data structure.
Date Open High Low Close Volume
0 31/12/2018 16.00 16.22 15.83 16.22 113550
1 28/12/2018 16.59 16.60 16.22 16.22 196076
2 27/12/2018 17.04 17.10 16.66 16.66 77764
3 24/12/2018 18.12 18.16 17.50 17.51 137047
4 21/12/2018 17.33 18.00 17.20 17.74 162391
5 20/12/2018 17.13 17.42 16.90 17.42 118405
I have used glob to read all the csv files into an array.
import glob
#To read all csv files
files = glob.glob('*.csv')
The out put of files[] looks like this.
['BBOZ.csv', 'CORE.csv', 'DJRE.csv', 'ETPMAG.csv', 'ETPMPD.csv', 'ETPMPM.csv', 'GOLD.csv', 'HACK.csv', 'IGB.csv', 'IJR.csv', 'IXJ.csv', 'MOAT.csv', 'MVS.csv', 'NDQ.csv', 'OZR.csv', 'SPY.csv', 'STW.csv', 'TECH.csv', 'USD.csv', 'VAE.csv', 'VAP.csv', 'VAS.csv', 'VDHG.csv', 'VGE.csv', 'VGS.csv', 'VTS.csv', 'YANK.csv', 'ZUSD.csv']
Each csv file is a ETF symbol
I'd like to create a dataframe that takes the ['Close'] column from each csv file and combines it into a single dataframe with the ticker symbols as each column populated by the close values for each symbols and the date as the first column
So the output looks like this:
Date BBOZ CORE DJRE ETPMAG ETPMPD .... ZUSD
31/12/2018 16 17 18 19 20 ... 21
30/12/2018 16 17 18 19 20 ... 22
29/12/2018 16 17 18 19 20 ... 23
28/12/2018 16 17 18 19 20 ... 24
etc
I'm stuck on how to create that dataframe.
You can load your columns into a dictionary and then pass it to pd.concat
:
import glob
col_list = {}
for fname in glob.iglob('*.csv'):
base, _ = os.path.splitext(fname)
col_list[base] = pd.read_csv(fname, usecols=['Close'], squeeze=True)
pd.concat(col_list, axis=1)