Search code examples
pythonpandascsvdataframefinance

Merge csv file columns and name columns


I am trying to combine multiple csv files for ETF data. These csv files have the following data structure.

           Date     Open   High    Low  Close  Volume
0      31/12/2018  16.00  16.22  15.83  16.22  113550
1      28/12/2018  16.59  16.60  16.22  16.22  196076
2      27/12/2018  17.04  17.10  16.66  16.66   77764
3      24/12/2018  18.12  18.16  17.50  17.51  137047
4      21/12/2018  17.33  18.00  17.20  17.74  162391
5      20/12/2018  17.13  17.42  16.90  17.42  118405

I have used glob to read all the csv files into an array.

import glob
#To read all csv files
files = glob.glob('*.csv')

The out put of files[] looks like this.

['BBOZ.csv', 'CORE.csv', 'DJRE.csv', 'ETPMAG.csv', 'ETPMPD.csv', 'ETPMPM.csv', 'GOLD.csv', 'HACK.csv', 'IGB.csv', 'IJR.csv', 'IXJ.csv', 'MOAT.csv', 'MVS.csv', 'NDQ.csv', 'OZR.csv', 'SPY.csv', 'STW.csv', 'TECH.csv', 'USD.csv', 'VAE.csv', 'VAP.csv', 'VAS.csv', 'VDHG.csv', 'VGE.csv', 'VGS.csv', 'VTS.csv', 'YANK.csv', 'ZUSD.csv']

Each csv file is a ETF symbol

I'd like to create a dataframe that takes the ['Close'] column from each csv file and combines it into a single dataframe with the ticker symbols as each column populated by the close values for each symbols and the date as the first column

So the output looks like this:

Date       BBOZ CORE DJRE ETPMAG ETPMPD .... ZUSD
31/12/2018 16   17   18   19     20     ...  21
30/12/2018 16   17   18   19     20     ...  22
29/12/2018 16   17   18   19     20     ...  23
28/12/2018 16   17   18   19     20     ...  24

Stock tables

etc

I'm stuck on how to create that dataframe.


Solution

  • You can load your columns into a dictionary and then pass it to pd.concat:

    import glob
    
    col_list = {}
    for fname in glob.iglob('*.csv'):
        base, _ = os.path.splitext(fname)
        col_list[base] = pd.read_csv(fname, usecols=['Close'], squeeze=True)
    
    pd.concat(col_list, axis=1)