Search code examples
pythonpandasdatetimeseries

How to replace hard-coded indexes with variables that work with different datasets?


I'd love some advice on how I can abide by the rule saying that my code has to work for any data. I tried to use indexing to find the biggest loss date for the question below, it didn't work. I ended up just hard coding the dates into the 'losshigh' variable, but I don't think that's sufficient. We also can't use for loops.

The 'stock' dataset we are working with is a pandas Series with a DateTime index and float values.

"Find the worst buy and sell date combo, aka. the two days where the percent lost when purchasing x and selling in y (with y after x) is maximized." Hint: Use method cummax() or method cummin().

Here is what I have so far.

====

#Reproducing stock.

import pandas as pd
import numpy as np
data = pd.read_csv('NVDA.csv',index_col=0, parse_dates=True)
stock = data['Close']
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
pd.set_option('display.float_format', lambda x: '%.2f' % x)
pd.set_option('display.max_columns', None)

====

#Find the highest index of x that comes before the lowest index of y

#The worst ratio of high stock in the past to whatever stock price is now.
bigloss=stock.cummax()/stock
biggestloss=bigloss.nlargest(1)

#Defining y.
y=stock.loc[biggestloss.index]

#The highest stock up to the biggest loss date.
losshigh=stock.loc['1999-01-22':'2002-10-09'].max()

#Defining x.
x=stock[stock==losshigh]

#The worst percentage.
percentage=(x[0]-y[0]/x[0])*100

Solution

  • Okay, so first a disclaimer: I interpreted this as what is the maximum total loss you can make from buying on one day, and selling on another. The tutors could be asking for the maximum loss per day from buying and selling, or the max percentage loss, or max percentage loss per day, but those are different problems.

    But anyway:

    # first, we import pandas
    import pandas as pd
    
    # second, create sample data
    stock = pd.Series([100, 120, 90, 30, 70, 150, 110, 100, 80, 90], index=pd.date_range(start='2023-01-01', periods=10, freq='D'), name='Stock_Price')
    
    # cummax() finds the minimum value in the data *so far*
    >>> stock.cummax()
    2023-01-01    100
    2023-01-02    120
    2023-01-03    120
    2023-01-04    120
    2023-01-05    120
    2023-01-06    150
    2023-01-07    150
    2023-01-08    150
    2023-01-09    150
    2023-01-10    150
    

    So basically, here's the idea: if on a particular day, the difference between the cummax and the stock price of that day is large, it means that we could have bought it in the past, at the higher price , and then sold it today. If the price today is lower than that maximum, then we'll lose money. Here is the results for our dataset:

    >>> stock - stock.cummax() 
    2023-01-01     0
    2023-01-02     0
    2023-01-03   -30
    2023-01-04   -90
    2023-01-05   -50
    2023-01-06     0
    2023-01-07   -40
    2023-01-08   -50
    2023-01-09   -70
    2023-01-10   -60
    

    Now, we can see just by looking, the worst day to sell would be on 2023-01-04. We can get that value by using idxmin():

    >>> sell_day = (stock - stock.cummax()).idxmin()
    Timestamp('2023-01-04 00:00:00')
    

    Now to get the data points from the beginning to the selling day, and then finding the highest price within those days, which has to be our buying day:

    >>> stock[:sell_day]
    2023-01-01    100
    2023-01-02    120
    2023-01-03     90
    2023-01-04     30
    
    >>> buy_day = stock.loc[:sell_day].idxmax()
    Timestamp('2023-01-02 00:00:00')
    

    Now finally, to get the loss, we can calculate that using simple arithmetic:

    >>> 100 * (stock[sell_day] - stock[buy_day]) / stock[sell_day]
    -300.0
    

    In the example dataset, you'd lose 300% by buying on 2023-01-02 at $120, and selling on 2023-01-04 at $30.