Using pandas dataframe I was able to access stock data from yahoo finance. I managed to organize it on a monthly basis as I am attempting to see how a stock performs in a given month over a multi year period. Specifically, how many times a given month has a positive price increase within the period (If the price always is up in the month of December, I would want to see the % and the # of occurrences). I'm currently using one stock 'MSFT' to test but ideally it would analyze many more. My problem: I can't seem to access the data in the dataframe for this purpose.
Here is the code:
from datetime import datetime
import pandas_datareader.data as web
import pandas as pd
from pandas.tseries.offsets import CustomBusinessMonthBegin
input_file = ['MSFT']
money_list = []
for ticker in input_file:
data = web.DataReader(ticker, "yahoo", datetime(2014,1,1), datetime(2015,12,31))
monthly_data = data.index.to_period('M')
min_day_in_month_index = pd.to_datetime(data.set_index(monthly_data, append=True).reset_index(level=0).groupby(level=0)['Open'].min())
custom_month_starts = CustomBusinessMonthBegin(calendar = min_day_in_month_index)
site_dict = {'Open':'first','High':'max','Low':'min','Close': 'last','Volume': 'sum','Adj Close': 'last'}
mthly_results = data.resample(custom_month_starts, how=site_dict)
month = { 0:'JAN', 1:'FEB', 2:'MAR', 3:'APR', 4:'MAY',5:'JUN', 6:'JUL', 7:'AUG', 8:'SEP', 9:'OCT',10:'NOV', 11:'DEC' }
print mthly_results
The first part of your question can be simplified greatly. (Your code is overly complex).
First you find the minimum for each month. Then you join back into the original data set. You now have the day on which the minimum occurred as well as the other columns.
The second part of your question is more tricky. To find the percent increase and decrease you need a little more machinery.
I added a class to keep track of the increases and decreases for a given month. Then I iterate over all the minimums and find the percent change from month to month. Then print it in a nice dataFrame.
from datetime import datetime
import pandas.io.data as web
import pandas as pd
class MonthlyChange(object):
months = { 0:'JAN', 1:'FEB', 2:'MAR', 3:'APR', 4:'MAY',5:'JUN', 6:'JUL', 7:'AUG', 8:'SEP', 9:'OCT',10:'NOV', 11:'DEC' }
def __init__(self,month):
self.month = MonthlyChange.months[month-1]
self.sum_of_pos_changes=0
self.sum_of_neg_changes=0
self.total_neg=0
self.total_pos=0
def add_change(self,change):
if change < 0:
self.sum_of_neg_changes+=change
self.total_neg+=1
elif change > 0:
self.sum_of_pos_changes+=change
self.total_pos+=1
def get_data(self):
if self.total_pos == 0:
return (self.month,0.0,0,self.sum_of_neg_changes/self.total_neg,self.total_neg)
elif self.total_neg == 0:
return (self.month,self.sum_of_pos_changes/self.total_pos,self.total_pos,0.0,0)
else:
return (self.month,self.sum_of_pos_changes/self.total_pos,self.total_pos,self.sum_of_neg_changes/self.total_neg,self.total_neg)
for ticker in ['MSFT'] :
data = web.DataReader(ticker, "yahoo", datetime(2014,1,1), datetime(2015,12,31))
data['ymd'] = data.index
year_month = data.index.to_period('M')
data['year_month'] = year_month
first_day_of_months = data.groupby(["year_month"])["ymd"].min()
first_day_of_months = first_day_of_months.to_frame().reset_index(level=0)
last_day_of_months = data.groupby(["year_month"])["ymd"].max()
last_day_of_months = last_day_of_months.to_frame().reset_index(level=0)
fday_open = data.merge(first_day_of_months,on=['ymd'])
fday_open = fday_open[['year_month_x','Open']]
lday_open = data.merge(last_day_of_months,on=['ymd'])
lday_open = lday_open[['year_month_x','Open']]
fday_lday = fday_open.merge(lday_open,on=['year_month_x'])
monthly_changes = {i:MonthlyChange(i) for i in range(1,13)}
for index,ym, openf,openl in fday_lday.itertuples():
month = ym.strftime('%m')
month = int(month)
diff = (openf-openl)/openf
monthly_changes[month].add_change(diff)
changes_df = pd.DataFrame([monthly_changes[i].get_data() for i in monthly_changes],columns=["Month","Avg Inc.","#Inc","Avg.Dec","#Dec"])
print(changes_df)
Explanation:
The code first grabs the data from the web.
data = web.DataReader(ticker, "yahoo", datetime(2014,1,1), datetime(2015,12,31))
The data looks like this.
Date, Open, High, Low, Close, Volume, Adj Close
2014-10-29, 46.43,46.70,46.34,46.61,3027610,44.77
Then it gets the year and month for every row in the table.
year_month = data.index.to_period('M')
Then it adds a new column in the table that has the year and month only.
data["year_month"] = year_month
The data now looks like this
Date, Open, year_month, ... (rest of columns)
2015-03-11, 42.31, 2015-03 ...
Next we group on the column year_month. Then we find the minimum value in the column Open for each month. We need to do a groupby because need to collect all the days in the month into a group so that we can take the min.
monthly_mins = data.groupby(["year_month"])["Open"].min()
Now we have all the monthly minimums in a single table. We know the day on which the minimum occurred. But we don't know what all the other columns are. So we join this back into the main table and we get only the monthly minimums.
month_min_open_df = data.merge(monthly_mins,on=["year_month","Open"])