Search code examples
pythonpandasfinance

python pandas empty Dataframe


You need these 2 files for the short script:

The problem is to get the index to Date. My index is always empty

import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import pandas_datareader.data as web
import datetime
 
paper = pd.read_csv('PreisMoatStanley2013.dat',
                    delimiter = ' ',
                    parse_dates=[0,1,100,101])
 
 
 
data = pd.DataFrame({'GoogleWE': paper['Google End Date'],
                    'debt': paper['debt'].astype(np.float64),
                    'DJIADate': paper['DJIA Date'],
                    'DJIAClose': paper['DJIA Closing Price']
                    .astype(np.float64)})
data.set_index('DJIADate')
print paper [:5]
 
import quandl
#api_key = open('quandl_key.txt','r').read()
 
djia = pd.read_csv("djia.csv", index_col=0)
 
print djia[:3]
 
djia_closes = djia['Close'].reset_index()
djia_closes.set_index('Date')
print djia_closes[:3]
#s152
data = pd.merge(data, djia_closes,
left_on='DJIADate', right_on='Date')
data.drop(['DJIADate'], inplace=True, axis=1)
data = data.set_index('Date') # Problem seems to be here
 
print data[:3]

Shell message:

Empty DataFrame

Columns: [DJIAClose, GoogleWE, debt, Close]

Index: []

How it should looks like:

Date       /DJIAClose /GoogleWE  /debt /Close
2004-01-12 10485.18   2004-01-10 0.21  10485.2
2004-01-20 10528.66   2004-01-17 0.21  10528.7
2004-01-26 10702.51   2004-01-24 0.21  10702.5

Data from data:

  DJIAClose DJIADate   GoogleWE   debt
0 10485.18  2004-01-12 2004-01-10 0.210000
1 10528.66  2004-01-20 2004-01-17 0.210000
2 10702.51  2004-01-26 2004-01-24 0.210000 
3 10499.18  2004-02-02 2004-01-31 0.213333
4 10579.03  2004-02-09 2004-02-07 0.200000
5 10714.88  2004-02-17 2004-02-14 0.203333
6 10609.62  2004-02-23 2004-02-21 0.200000
7 10678.14  2004-03-01 2004-02-28 0.200000
8 10529.48  2004-03-08 2004-03-06 0.196667

Data from djia_closes

Date       Close 
2004-01-02 10409.9
2004-01-08 10592.4
2004-01-09 10458.9
2004-01-12 10485.2
2004-01-13 10427.2
2004-01-14 10538.4
2004-01-20 10528.7
2004-01-26 10702.5

data.head()

  DJIAClose   DJIADate   GoogleWE  debt
0 10485.18 2004-01-12 2004-01-10 0.210000
1 10528.66 2004-01-20 2004-01-17 0.210000
2 10702.51 2004-01-26 2004-01-24 0.210000
3 10499.18 2004-02-02 2004-01-31 0.213333
4 10579.03 2004-02-09 2004-02-07 0.200000

djia_closes.head()

  Date    Close
0 2004-01-02 10409.9
1 2004-01-05 10544.1
2 2004-01-06 10538.7
3 2004-01-07 10529.0
4 2004-01-08 10592.4

Solution

  • You use the parse_dates option when reading in the paper csv, but not when reading in the djia csv. That might be causing your problem - the "date" column data is a datetime object, but in djia_close the date is just a string. So there aren't any matches, which is why you're getting an empty dataframe when you merge them.

    Try removing parse_dates from read_csv and see what happens, or try converting both date columns to datetime objects before merging.