You need these 2 files for the short script:
The problem is to get the index to Date. My index is always empty
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import pandas_datareader.data as web
import datetime
paper = pd.read_csv('PreisMoatStanley2013.dat',
delimiter = ' ',
parse_dates=[0,1,100,101])
data = pd.DataFrame({'GoogleWE': paper['Google End Date'],
'debt': paper['debt'].astype(np.float64),
'DJIADate': paper['DJIA Date'],
'DJIAClose': paper['DJIA Closing Price']
.astype(np.float64)})
data.set_index('DJIADate')
print paper [:5]
import quandl
#api_key = open('quandl_key.txt','r').read()
djia = pd.read_csv("djia.csv", index_col=0)
print djia[:3]
djia_closes = djia['Close'].reset_index()
djia_closes.set_index('Date')
print djia_closes[:3]
#s152
data = pd.merge(data, djia_closes,
left_on='DJIADate', right_on='Date')
data.drop(['DJIADate'], inplace=True, axis=1)
data = data.set_index('Date') # Problem seems to be here
print data[:3]
Shell message:
Empty DataFrame
Columns: [DJIAClose, GoogleWE, debt, Close]
Index: []
How it should looks like:
Date /DJIAClose /GoogleWE /debt /Close
2004-01-12 10485.18 2004-01-10 0.21 10485.2
2004-01-20 10528.66 2004-01-17 0.21 10528.7
2004-01-26 10702.51 2004-01-24 0.21 10702.5
Data from data
:
DJIAClose DJIADate GoogleWE debt
0 10485.18 2004-01-12 2004-01-10 0.210000
1 10528.66 2004-01-20 2004-01-17 0.210000
2 10702.51 2004-01-26 2004-01-24 0.210000
3 10499.18 2004-02-02 2004-01-31 0.213333
4 10579.03 2004-02-09 2004-02-07 0.200000
5 10714.88 2004-02-17 2004-02-14 0.203333
6 10609.62 2004-02-23 2004-02-21 0.200000
7 10678.14 2004-03-01 2004-02-28 0.200000
8 10529.48 2004-03-08 2004-03-06 0.196667
Data from djia_closes
Date Close
2004-01-02 10409.9
2004-01-08 10592.4
2004-01-09 10458.9
2004-01-12 10485.2
2004-01-13 10427.2
2004-01-14 10538.4
2004-01-20 10528.7
2004-01-26 10702.5
data.head()
DJIAClose DJIADate GoogleWE debt
0 10485.18 2004-01-12 2004-01-10 0.210000
1 10528.66 2004-01-20 2004-01-17 0.210000
2 10702.51 2004-01-26 2004-01-24 0.210000
3 10499.18 2004-02-02 2004-01-31 0.213333
4 10579.03 2004-02-09 2004-02-07 0.200000
djia_closes.head()
Date Close
0 2004-01-02 10409.9
1 2004-01-05 10544.1
2 2004-01-06 10538.7
3 2004-01-07 10529.0
4 2004-01-08 10592.4
You use the parse_dates
option when reading in the paper
csv, but not when reading in the djia
csv. That might be causing your problem - the "date" column data
is a datetime object, but in djia_close
the date is just a string. So there aren't any matches, which is why you're getting an empty dataframe when you merge them.
Try removing parse_dates
from read_csv
and see what happens, or try converting both date columns to datetime objects before merging.