Search code examples
pythoncsvmergepandasforex

Python 2.7 and Pandas merge 2 csv files with Forex Data


I have two csv files

1: eurusd.csv 2: xauusd.csv

there is no header in the files, but the data is Date Time, Open, High, Low, Close, Volume. each file has the following type of data...

eurusd.csv:

02/23/2009 18:01,1.27090,1.27110,1.27040,1.27050,204
02/23/2009 18:02,1.27060,1.27060,1.27000,1.27040,211
02/23/2009 18:03,1.27040,1.27050,1.27010,1.27040,82
02/23/2009 18:04,1.27020,1.27080,1.27020,1.27060,98

xauusd.csv

02/23/2009 18:01,991.260,992.120,990.800,991.840,69
02/23/2009 18:02,991.800,992.260,991.200,991.870,74
02/23/2009 18:04,991.820,991.830,990.700,991.320,74

I want to merge the data based on the date time column which is the first column in both files. as you can see the second file does not have exact same records as the first file, so missing some data from second file, but thats ok. I want to bring the Close column from second file to the first file based on matching date tile field between them

so the final merged csv file will have the following columns... Date Time, Open, High, Low, Close, Volume, CloseFromSecondCsv

merged.csv

02/23/2009 18:01,1.27090,1.27110,1.27040,1.27050,204,991.840
02/23/2009 18:02,1.27060,1.27060,1.27000,1.27040,211,991.870
02/23/2009 18:03,1.27040,1.27050,1.27010,1.27040,82,0
02/23/2009 18:04,1.27020,1.27080,1.27020,1.27060,98,991.320

im not sure how to do this. thanks in advance


Final working code based on TomAugspurger's answer:

import pandas as pd
df1 = pd.read_csv("C:\IQFEED\XAUUSDO.COMP_1.csv", index_col=0, names=['time', 'open', 'high', 'low', 'close', 'volume'], parse_dates=True)
df2 = pd.read_csv("C:\IQFEED\EURUSD.COMP_1.csv", index_col=0, names=['time', 'open', 'high', 'low', 'close', 'volume'], parse_dates=True)
df2['other_close'] = df1['close']
df2.fillna(0)
df2.to_csv('C:\IQFEED\python.csv')

I figured out using column numbers only way...

import pandas as pd
df1 = pd.read_csv("C:\IQFEED\XAUUSD.txt", index_col=0, usecols=[0,4], parse_dates=True, header=None)
df2 = pd.read_csv("C:\IQFEED\EURUSD.txt", index_col=0, usecols=[0,1,2,3,4,5], parse_dates=True, header=None)
df2[6] = df1[4]
df2.to_csv('C:\IQFEED\python1.csv')

Solution

  • If I understand your question correctly, Andy had the right idea in his comment. You don't want to merge to two, you want to copy the column of one into the other.

    In [48]: df1 = pd.read_csv(StringIO(b), index_col=0, names=['time', 'open', 'high', 'low', 'close', 'volume'], parse_dates=True)
    
    
    In [49]: df2 = pd.read_csv(StringIO(a), index_col=0, names=['time', 'open', 'high', 'low', 'close', 'volume'], parse_dates=True)
    
    
    In [71]: df1
    Out[71]: 
                           open    high    low   close  volume
    time                                                      
    2009-02-23 18:01:00  991.26  992.12  990.8  991.84      69
    2009-02-23 18:02:00  991.80  992.26  991.2  991.87      74
    2009-02-23 18:04:00  991.82  991.83  990.7  991.32      74
    
    In [72]: df2
    Out[72]: 
                           open    high     low   close  volume
    time                                                       
    2009-02-23 18:01:00  1.2709  1.2711  1.2704  1.2705     204
    2009-02-23 18:02:00  1.2706  1.2706  1.2700  1.2704     211
    2009-02-23 18:03:00  1.2704  1.2705  1.2701  1.2704      82
    2009-02-23 18:04:00  1.2702  1.2708  1.2702  1.2706      98
    
    
    
    
    
    In [51]: df2['other_close'] = df1['close']
    
    
    In [52]: df2
    Out[52]: 
                           open    high     low   close  volume  other_close
    time                                                                    
    2009-02-23 18:01:00  1.2709  1.2711  1.2704  1.2705     204       991.84
    2009-02-23 18:02:00  1.2706  1.2706  1.2700  1.2704     211       991.87
    2009-02-23 18:03:00  1.2704  1.2705  1.2701  1.2704      82          NaN
    2009-02-23 18:04:00  1.2702  1.2708  1.2702  1.2706      98       991.32
    

    You can do df2.fillna(0) to get your expected answer.