pandas Resample function print intermediate candle

When we need to convert minute data to daily data we use following code

df.resample('D').apply({'low': lambda s: s.min(),'high': lambda s: s.max(),'volume': lambda s: s.sum()}).dropna()

It give me one daily candle for each day . Now what i am looking instead of giving me one candle it gives me all candle(Daily Candle) for that day with intermediate state .

let's say day start from 9:30 and close at 16:00 so first daily candle would have open/High/Low/Close of 9:30 Minute candle and at 9:31 open would be of 9:30 close would be of 9:31 and High, Low would be compared and update accordingly.

This i need for Market replay feature so user can see who daily candle is getting change during the day as data start coming.

any help would be really appreciated .

Solution

I misunderstood your question the first time and am supplying a new answer for you.

First let's get some data:

import yfinance as yf

ticker = "msft"
df = yf.download(ticker, period="4d", interval="1m",)
df = df.loc[:, ["Open", "High", "Low", "Close", "Volume"]]

Let's create a new dataframe called df1 that will hold your results. The close line will be the same at every bar since this is the last piece of data, so we will start with that.

df1 = pd.DataFrame()
df1["Close"] = df["Close"]

If we groupby the date, then the open will always be the first value of that date.

df1["Open"] = df.groupby(pd.Grouper(level=0, freq="D"))["Open"].transform("first")

So far we have:

                            Close    Open
Datetime                                 
2020-06-10 09:30:00-04:00 191.260 191.125
2020-06-10 09:31:00-04:00 191.260 191.125
2020-06-10 09:33:00-04:00 191.405 191.125
2020-06-10 09:34:00-04:00 191.490 191.125
2020-06-10 09:35:00-04:00 191.510 191.125
...                           ...     ...
2020-06-15 15:55:00-04:00 188.690 184.580
2020-06-15 15:56:00-04:00 188.700 184.580
2020-06-15 15:57:00-04:00 188.610 184.580
2020-06-15 15:58:00-04:00 188.470 184.580
2020-06-15 15:59:00-04:00 189.170 184.580

The low and high value are calculated using expanding.

df1["Low"] = (
    df.groupby(pd.Grouper(level=0, freq="D"))["Low"].expanding().min().droplevel(0)
)

df1["High"] = (
    df.groupby(pd.Grouper(level=0, freq="D"))["High"].expanding().max().droplevel(0)
)

Finally the volume can be calculated using cumsum over groupby date.

df1["Volume"] = df.groupby(pd.Grouper(level=0, freq="D"))["Volume"].cumsum()

Your final dataframe looks like:

print(df1[["Open", "High", "Low", "Close", "Volume"]])
                             Open    High     Low   Close    Volume
Datetime                                                           
2020-06-10 09:30:00-04:00 191.125 191.390 191.010 191.260    965121
2020-06-10 09:31:00-04:00 191.125 191.450 191.010 191.260   1112327
2020-06-10 09:33:00-04:00 191.125 191.590 191.010 191.405   1433916
2020-06-10 09:34:00-04:00 191.125 191.590 191.010 191.490   1566392
2020-06-10 09:35:00-04:00 191.125 191.800 191.010 191.510   1759454
...                           ...     ...     ...     ...       ...
2020-06-15 15:55:00-04:00 184.580 190.820 184.010 188.690  26253657
2020-06-15 15:56:00-04:00 184.580 190.820 184.010 188.700  26477100
2020-06-15 15:57:00-04:00 184.580 190.820 184.010 188.610  26738140
2020-06-15 15:58:00-04:00 184.580 190.820 184.010 188.470  27120167
2020-06-15 15:59:00-04:00 184.580 190.820 184.010 189.170  27933060

[1421 rows x 5 columns]