Search code examples
pythonpandasfinance

pandas Resample function print intermediate candle


When we need to convert minute data to daily data we use following code

df.resample('D').apply({'low': lambda s: s.min(),'high': lambda s: s.max(),'volume': lambda s: s.sum()}).dropna() 

It give me one daily candle for each day . Now what i am looking instead of giving me one candle it gives me all candle(Daily Candle) for that day with intermediate state .

let's say day start from 9:30 and close at 16:00 so first daily candle would have open/High/Low/Close of 9:30 Minute candle and at 9:31 open would be of 9:30 close would be of 9:31 and High, Low would be compared and update accordingly.

This i need for Market replay feature so user can see who daily candle is getting change during the day as data start coming.

any help would be really appreciated .


Solution

  • I misunderstood your question the first time and am supplying a new answer for you.

    First let's get some data:

    import yfinance as yf
    
    ticker = "msft"
    df = yf.download(ticker, period="4d", interval="1m",)
    df = df.loc[:, ["Open", "High", "Low", "Close", "Volume"]]
    

    Let's create a new dataframe called df1 that will hold your results. The close line will be the same at every bar since this is the last piece of data, so we will start with that.

    df1 = pd.DataFrame()
    df1["Close"] = df["Close"]
    

    If we groupby the date, then the open will always be the first value of that date.

    df1["Open"] = df.groupby(pd.Grouper(level=0, freq="D"))["Open"].transform("first")
    

    So far we have:

                                Close    Open
    Datetime                                 
    2020-06-10 09:30:00-04:00 191.260 191.125
    2020-06-10 09:31:00-04:00 191.260 191.125
    2020-06-10 09:33:00-04:00 191.405 191.125
    2020-06-10 09:34:00-04:00 191.490 191.125
    2020-06-10 09:35:00-04:00 191.510 191.125
    ...                           ...     ...
    2020-06-15 15:55:00-04:00 188.690 184.580
    2020-06-15 15:56:00-04:00 188.700 184.580
    2020-06-15 15:57:00-04:00 188.610 184.580
    2020-06-15 15:58:00-04:00 188.470 184.580
    2020-06-15 15:59:00-04:00 189.170 184.580
    

    The low and high value are calculated using expanding.

    df1["Low"] = (
        df.groupby(pd.Grouper(level=0, freq="D"))["Low"].expanding().min().droplevel(0)
    )
    
    df1["High"] = (
        df.groupby(pd.Grouper(level=0, freq="D"))["High"].expanding().max().droplevel(0)
    )
    

    Finally the volume can be calculated using cumsum over groupby date.

    df1["Volume"] = df.groupby(pd.Grouper(level=0, freq="D"))["Volume"].cumsum()
    

    Your final dataframe looks like:

    print(df1[["Open", "High", "Low", "Close", "Volume"]])
                                 Open    High     Low   Close    Volume
    Datetime                                                           
    2020-06-10 09:30:00-04:00 191.125 191.390 191.010 191.260    965121
    2020-06-10 09:31:00-04:00 191.125 191.450 191.010 191.260   1112327
    2020-06-10 09:33:00-04:00 191.125 191.590 191.010 191.405   1433916
    2020-06-10 09:34:00-04:00 191.125 191.590 191.010 191.490   1566392
    2020-06-10 09:35:00-04:00 191.125 191.800 191.010 191.510   1759454
    ...                           ...     ...     ...     ...       ...
    2020-06-15 15:55:00-04:00 184.580 190.820 184.010 188.690  26253657
    2020-06-15 15:56:00-04:00 184.580 190.820 184.010 188.700  26477100
    2020-06-15 15:57:00-04:00 184.580 190.820 184.010 188.610  26738140
    2020-06-15 15:58:00-04:00 184.580 190.820 184.010 188.470  27120167
    2020-06-15 15:59:00-04:00 184.580 190.820 184.010 189.170  27933060
    
    [1421 rows x 5 columns]