I am learning python + pandas for data analysis. I try to program some investment ideas as exercises. pandas has this nice io.data module to pull data from online sources, such as Yahoo and Google. However, they all require a start date, which by default is "2010.01.01", as specified in the following code in data.py
http://github.com/pydata/pandas/blob/master/pandas/io/data.py:
def _sanitize_dates(start, end):
from pandas.core.datetools import to_datetime
start = to_datetime(start)
end = to_datetime(end)
if start is None:
start = dt.datetime(2010, 1, 1)
if end is None:
end = dt.datetime.today()
return start, end
Since every stock IPOed at different dates in history, it will be very hard to specify for each ticker. Wouldn't it be nice if there is an option to set pandas to read ALL data? Even for a 50 year old public company, the data is only ~50*200 = 10,000 rows. Python should be OK to handle that, right?
Thank you for your help. And my salute to Wes and other pandas contributors; pandas is great!
A simple solution would be to assume some common start date (before which information would not exist). 1 January 1970 seems like a fair choice.
In [55]: from pandas.io.data import DataReader
In [56]: from datetime import datetime
In [57]: df_1=DataReader("AAPL", "yahoo", datetime(1970,1,1), datetime(2013,10,1))
In [58]: df_1
Out[58]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7330 entries, 1984-09-07 00:00:00 to 2013-10-01 00:00:00
Data columns (total 6 columns):
Open 7330 non-null values
High 7330 non-null values
Low 7330 non-null values
Close 7330 non-null values
Volume 7330 non-null values
Adj Close 7330 non-null values
dtypes: float64(5), int64(1)
Now, we shall choose the starting date as 1984-09-07 and observe that we pull the same data, thereby, ending with the same DataFrame.
In [59]: df_2 = DataReader("AAPL", "yahoo", datetime(1984,9,7), datetime(2013,10,1))
In [60]: df_2
Out [60]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7330 entries, 1984-09-07 00:00:00 to 2013-10-01 00:00:00
Data columns (total 6 columns):
Open 7330 non-null values
High 7330 non-null values
Low 7330 non-null values
Close 7330 non-null values
Volume 7330 non-null values
Adj Close 7330 non-null values
dtypes: float64(5), int64(1)