Isolating/manipulating data in a pandas Dataframe

Using pandas dataframe I was able to access stock data from yahoo finance. I managed to organize it on a monthly basis as I am attempting to see how a stock performs in a given month over a multi year period. Specifically, how many times a given month has a positive price increase within the period (If the price always is up in the month of December, I would want to see the % and the # of occurrences). I'm currently using one stock 'MSFT' to test but ideally it would analyze many more. My problem: I can't seem to access the data in the dataframe for this purpose.

Here is the code:

from datetime import datetime
import pandas_datareader.data as web
import pandas as pd
from pandas.tseries.offsets import CustomBusinessMonthBegin

input_file = ['MSFT'] 
money_list = []

for ticker in input_file:

    data = web.DataReader(ticker, "yahoo", datetime(2014,1,1), datetime(2015,12,31))

    monthly_data = data.index.to_period('M')
    min_day_in_month_index = pd.to_datetime(data.set_index(monthly_data, append=True).reset_index(level=0).groupby(level=0)['Open'].min())
    custom_month_starts = CustomBusinessMonthBegin(calendar = min_day_in_month_index)

    site_dict = {'Open':'first','High':'max','Low':'min','Close': 'last','Volume': 'sum','Adj Close': 'last'}
    mthly_results = data.resample(custom_month_starts, how=site_dict)

    month = { 0:'JAN', 1:'FEB', 2:'MAR', 3:'APR', 4:'MAY',5:'JUN', 6:'JUL', 7:'AUG', 8:'SEP', 9:'OCT',10:'NOV', 11:'DEC' }

    print mthly_results

Solution

The first part of your question can be simplified greatly. (Your code is overly complex).

First you find the minimum for each month. Then you join back into the original data set. You now have the day on which the minimum occurred as well as the other columns.

The second part of your question is more tricky. To find the percent increase and decrease you need a little more machinery.

I added a class to keep track of the increases and decreases for a given month. Then I iterate over all the minimums and find the percent change from month to month. Then print it in a nice dataFrame.

from datetime import datetime
import pandas.io.data as web
import pandas as pd

class MonthlyChange(object):
    months = { 0:'JAN', 1:'FEB', 2:'MAR', 3:'APR', 4:'MAY',5:'JUN', 6:'JUL', 7:'AUG', 8:'SEP', 9:'OCT',10:'NOV', 11:'DEC' }

    def __init__(self,month):
        self.month = MonthlyChange.months[month-1]
        self.sum_of_pos_changes=0
        self.sum_of_neg_changes=0
        self.total_neg=0
        self.total_pos=0
    def add_change(self,change):
        if change < 0:
            self.sum_of_neg_changes+=change
            self.total_neg+=1
        elif change > 0:
            self.sum_of_pos_changes+=change
            self.total_pos+=1
    def get_data(self):
        if self.total_pos == 0:
            return (self.month,0.0,0,self.sum_of_neg_changes/self.total_neg,self.total_neg)
        elif self.total_neg == 0:
            return (self.month,self.sum_of_pos_changes/self.total_pos,self.total_pos,0.0,0)
        else:
            return (self.month,self.sum_of_pos_changes/self.total_pos,self.total_pos,self.sum_of_neg_changes/self.total_neg,self.total_neg)


for ticker in ['MSFT'] :

    data = web.DataReader(ticker, "yahoo", datetime(2014,1,1), datetime(2015,12,31))
    data['ymd'] = data.index
    year_month = data.index.to_period('M')
    data['year_month'] = year_month
    first_day_of_months = data.groupby(["year_month"])["ymd"].min()
    first_day_of_months = first_day_of_months.to_frame().reset_index(level=0)
    last_day_of_months = data.groupby(["year_month"])["ymd"].max()
    last_day_of_months = last_day_of_months.to_frame().reset_index(level=0)
    fday_open = data.merge(first_day_of_months,on=['ymd'])
    fday_open = fday_open[['year_month_x','Open']]
    lday_open = data.merge(last_day_of_months,on=['ymd'])
    lday_open = lday_open[['year_month_x','Open']]

    fday_lday = fday_open.merge(lday_open,on=['year_month_x'])
    monthly_changes = {i:MonthlyChange(i) for i in range(1,13)}
    for index,ym, openf,openl in fday_lday.itertuples():
        month = ym.strftime('%m')
        month = int(month)
        diff = (openf-openl)/openf
        monthly_changes[month].add_change(diff)
    changes_df = pd.DataFrame([monthly_changes[i].get_data() for i in monthly_changes],columns=["Month","Avg Inc.","#Inc","Avg.Dec","#Dec"])
    print(changes_df)

Explanation:

The code first grabs the data from the web.

data = web.DataReader(ticker, "yahoo", datetime(2014,1,1), datetime(2015,12,31))

The data looks like this.

Date, Open, High, Low, Close, Volume, Adj Close

2014-10-29, 46.43,46.70,46.34,46.61,3027610,44.77

Then it gets the year and month for every row in the table.

year_month = data.index.to_period('M')

Then it adds a new column in the table that has the year and month only.

data["year_month"] = year_month

The data now looks like this

Date, Open, year_month, ... (rest of columns)

2015-03-11, 42.31, 2015-03 ...

Next we group on the column year_month. Then we find the minimum value in the column Open for each month. We need to do a groupby because need to collect all the days in the month into a group so that we can take the min.

monthly_mins = data.groupby(["year_month"])["Open"].min()

Now we have all the monthly minimums in a single table. We know the day on which the minimum occurred. But we don't know what all the other columns are. So we join this back into the main table and we get only the monthly minimums.

month_min_open_df = data.merge(monthly_mins,on=["year_month","Open"])