I am trying to adjust my portfolio of stocks and trying to calculate the adjusted mean price (which is a form of weighted-average).
Here is sample data:
import pandas as pd
import numpy as np
sample_dict = {'ticker': {1: 'ABCD',
2: 'ABCD', 3: 'ABCD', 4: 'ABCD', 5: 'ABCD', 6: 'ABCD', 8: 'EFGH',
9: 'EFGH', 10: 'EFGH', 11: 'EFGH', 12: 'EFGH', 13: 'EFGH'},
'Date': {1: "2018, 1, 10", 2: "2018, 1, 20",
3: "2018, 2, 7", 4: "2018, 4, 14",
5: "2018, 5, 25", 6: "2018, 7, 4",
8: "2018, 1, 10", 9: "2018, 1, 20",
10: "2018, 2, 7", 11: "2018, 4, 14",
12: "2018, 5, 25", 13: "2018, 7, 4"},
'Sell_buy': {1: 'buy', 2: 'buy', 3: 'sell', 4: 'buy', 5: 'sell', 6: 'buy',
8: 'buy', 9: 'buy', 10: 'buy', 11: 'buy', 12: 'sell', 13: 'sell'},
'Qtd': {1: 100.0, 2: 300.0, 3: 200.0, 4: 500.0, 5: 600.0, 6: 500.0,
8: 300.0, 9: 300.0, 10: 200.0, 11: 200.0, 12: 700.0, 13: 100.0},
'Price': {1: 8.0, 2: 10.0, 3: 12.0, 4: 9.0, 5: 13.0, 6: 14.0,
8: 8.0, 9: 10.0, 10: 12.0, 11: 9.0, 12: 13.0, 13: 14.0},
'Costs': {1: 10.93, 2: 12.52, 3: 11.39, 4: 14.5, 5: 14.68, 6: 14.96,
8: 10.93, 9: 12.52, 10: 11.39, 11: 14.5, 12: 14.68, 13: 14.96}}
sample_df = pd.DataFrame(sample_dict)
sample_df['Date']=pd.to_datetime(sample_df['Date'], dayfirst=True).dt.date
I was able to get the get an updated Adjusted quantity (based on buys and sells) without any issues:
#to calculate adjusted quantity. this works as expected
sample_df['Adj Qtd'] = sample_df.apply(lambda x: ((x.Sell_buy == "buy") - (x.Sell_buy == "sell")) * x['Qtd'], axis = 1)
sample_df['Adj Qtd'] = sample_df.groupby('ticker')['Adj Qtd'].cumsum()
However, I am not able to get the correct Adjusted Price. The condition here is if I sell a stock, my Adjusted Price should not change and remain the same as the last adjusted price when the buy was made for that stock.
I've tried the following to achieve this goal:
#Adjust Price. Works good until I reach the row where a sell was made
sample_df['Adjusted Price'] = sample_df.apply(lambda x: ((x.Sell_buy == "buy") - (x.Sell_buy == "sell")) * (x["Price"] * x["Qtd"] + x["Costs"]), axis = 1)
sample_df['Adjusted Price'] = sample_df.groupby('ticker')['Adjusted Price'].cumsum().div(sample_df['Adj Qtd'])
I could partially correct the adjusted prices on sell rows with the following:
# When it's a "sell", adjusted price is the same from above
sample_df.loc[sample_df['Sell_buy'] == 'sell',['Adjusted Price']] = np.NaN
sample_df.fillna(method='ffill', inplace=True)
ticker Date Sell_buy Qtd Price Costs Adj Qtd Adjusted Price
1 ABCD 2018-10-01 buy 100.0 8.0 10.93 100.0 8.109300
2 ABCD 2018-01-20 buy 300.0 10.0 12.52 400.0 9.558625
3 ABCD 2018-07-02 sell 200.0 12.0 11.39 200.0 9.558625
4 ABCD 2018-04-14 buy 500.0 9.0 14.50 700.0 8.466514
5 ABCD 2018-05-25 sell 600.0 13.0 14.68 100.0 8.466514
6 ABCD 2018-04-07 buy 500.0 14.0 14.96 600.0 8.544733
8 EFGH 2018-10-01 buy 300.0 8.0 10.93 300.0 8.036433
9 EFGH 2018-01-20 buy 300.0 10.0 12.52 600.0 9.039083
10 EFGH 2018-07-02 buy 200.0 12.0 11.39 800.0 9.793550
11 EFGH 2018-04-14 buy 200.0 9.0 14.50 1000.0 9.649340
12 EFGH 2018-05-25 sell 700.0 13.0 14.68 300.0 9.649340
13 EFGH 2018-04-07 sell 100.0 14.0 14.96 200.0 9.649340
This would work perfectly if there is no "sell" between "buy"s (as it did for the stock EFGH in this example). To be clear, when the transaction is a "sell", we must ignore adjusting the price and use the last adjusted price in the last buy type transaction for that particular stock.
I did all these calculus in excel and the output should be the following:
For further clarification, the excel formula for that selected cell was:
=IF(C3="buy";(I2*G2+D3*E3+F3)/G3;IF(G3<>0;I2;0))
I've also tried .groupby("ticker").apply() a function using shift() in order to use prior values from rows above, but I've failed.
I couldn't figure out a nice, simple solution to this. The problem is that calculating adjusted price depends on the previous value of adjusted price, which prevents the use of vectorization or shift().
So, here's the ugly solution. :)
First step is to use a groupby, to separate it by ticker symbol. Then, it loops over all the rows in that group, and calculates a weighted average of the price to get the current shares, and the previous price. Then, it adds that list as a column in the dataframe.
def weighted_average(a, b, a_weight):
"""Take an average of a and b, with a weighted by a_weight"""
assert 0 <= a_weight <= 1
return a * a_weight + b * (1 - a_weight)
def get_adjusted_price_for_ticker(single_ticker_df):
adjusted_price = 0
current_shares = 0
prices = []
for _, row in single_ticker_df.iterrows():
is_buy = row["Sell_buy"] == "buy"
qtd = row["Qtd"]
if is_buy:
current_shares += qtd
cost_per_share = (qtd * row["Price"] + row["Costs"]) / qtd
proportion_of_new_shares = qtd / current_shares
adjusted_price = weighted_average(cost_per_share, adjusted_price, proportion_of_new_shares)
else:
current_shares -= qtd
prices.append(adjusted_price)
single_ticker_df["Adjusted Price"] = prices
return single_ticker_df
def get_adjusted_price(df):
return df.groupby("ticker").apply(get_adjusted_price_for_ticker)
get_adjusted_price(sample_df)
Output:
ticker Date Sell_buy Qtd Price Costs Adjusted Price
1 ABCD 2018-10-01 buy 100.0 8.0 10.93 8.109300
2 ABCD 2018-01-20 buy 300.0 10.0 12.52 9.558625
3 ABCD 2018-07-02 sell 200.0 12.0 11.39 9.558625
4 ABCD 2018-04-14 buy 500.0 9.0 14.50 9.180321
5 ABCD 2018-05-25 sell 600.0 13.0 14.68 9.180321
6 ABCD 2018-04-07 buy 500.0 14.0 14.96 13.221654
8 EFGH 2018-10-01 buy 300.0 8.0 10.93 8.036433
9 EFGH 2018-01-20 buy 300.0 10.0 12.52 9.039083
10 EFGH 2018-07-02 buy 200.0 12.0 11.39 9.793550
11 EFGH 2018-04-14 buy 200.0 9.0 14.50 9.649340
12 EFGH 2018-05-25 sell 700.0 13.0 14.68 9.649340
13 EFGH 2018-04-07 sell 100.0 14.0 14.96 9.649340