Search code examples
pythonparsingcsvpandasmetatrader4

Import historical Metatrader CSV data to Python with Pandas library (date/time parsing)


I have some CSV data like

2011.12.08,22:45,1.33434,1.33465,1.33415,1.33419,265
2011.12.08,23:00,1.33419,1.33542,1.33419,1.33472,391
2011.12.08,23:15,1.33470,1.33483,1.33383,1.33411,420
2011.12.08,23:30,1.33413,1.33451,1.33389,1.33400,285

coming from Metatrader 4 in a file named EURUSD15.csv

I would like to import this file with Python using Pandas library and read_csv function...

So I did this :

#!/usr/bin/env python
from pandas import *
df = read_csv('data/EURUSD15.csv', header=None)
df.columns = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume']
print(df)

I would like now to have date/time parsed...

so I changed

df = read_csv('data/EURUSD15.csv', header=None)

to

df = read_csv('data/EURUSD15.csv', header=None, parse_dates=[[1, 2]])

But I get this error message

Exception: Length mismatch (7 vs 6)

How can I parse date and time columns and have the 2 columns considered as 1 "datetime" column.


Solution

  • The columns are zero indexed, so you need to do parse_dates=[[0,1]]
    This is on latest version of pandas but should work with 0.8.0+:

    In [74]: data = """\
    2011.12.08,22:45,1.33434,1.33465,1.33415,1.33419,265
    2011.12.08,23:00,1.33419,1.33542,1.33419,1.33472,391
    2011.12.08,23:15,1.33470,1.33483,1.33383,1.33411,420
    2011.12.08,23:30,1.33413,1.33451,1.33389,1.33400,285
    """
    
    In [75]: pd.read_csv(StringIO(data), 
                         names=['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume'], 
                         index_col='Date_Time', parse_dates=[[0, 1]])
    Out[75]: 
                            Open     High      Low    Close  Volume
    Date_Time                                                      
    2011-12-08 22:45:00  1.33434  1.33465  1.33415  1.33419     265
    2011-12-08 23:00:00  1.33419  1.33542  1.33419  1.33472     391
    2011-12-08 23:15:00  1.33470  1.33483  1.33383  1.33411     420
    2011-12-08 23:30:00  1.33413  1.33451  1.33389  1.33400     285
    

    Note the index_col=0 will also work. Complex date parsing prepends resulting columns so parse_dates will refer to pre-date processing column indices (i.e., 0 is Date and 1 is Time) and index_col refers to post-date processing column indices. Thus, using column names are recommended since it allows you to not have to think about pre-vs-post processing columns indices.