Search code examples
pythonpandasdataframeresample

Dataframe Resample date value keeping 'prices'


here a sample my dataframe :

    date début  € / mois enerc  € / mois edf
0   2021-04-01  40.86   8.46
1   2021-04-10  40.86   8.46
2   2021-04-16  33.69   8.46
3   2021-06-10  33.69   8.46
4   2021-08-01  37.71   9.35
5   2021-08-10  37.74   9.35

I want to resample date for daily rule, but keeping the price from when they change, until they change again, for example :

2021-04-10  40.86   8.46
2021-04-11  40.86   8.46
2021-04-12  40.86   8.46
2021-04-13  40.86   8.46
2021-04-14  40.86   8.46
2021-04-15  40.86   8.46
2021-04-16  33.69   8.46
2021-04-17  33.69   8.46
2021-04-18  33.69   8.46

etc. I don't want to interpolate, just to copy and change at the good period.


Solution

  • you can do the following:

    • find minimum and maximum dates
    • create an array of dates from min to max increasing day by day and convert it to a dataframe
    • merge with the original data frame how='letf'
    • fillna values using forward fill
    from datetime import datetime , timedelta
    import pandas as pd
    cols = ["date début  €", "mois enerc  €", "mois edf"]
    data = [["2021-04-01",  40.86,   8.46],
            ["2021-04-10",  40.86,   8.46],
            ["2021-04-16",  33.69,   8.46],
            ["2021-06-10",  33.69,   8.46],
            ["2021-08-01",  37.71,   9.35],
            ["2021-08-10",  37.74,   9.35]]
    
    df = pd.DataFrame(data, columns=cols)
    df["date début  €"] = pd.to_datetime(df["date début  €"])
    
    # find  min and max values of date
    start_date = df["date début  €"].min()
    end_date = df["date début  €"].max()
    number_of_days = (end_date - start_date).days
    
    # create array of dates
    date_list=[(start_date + timedelta(days=days)).strftime('%Y-%m-%d') for days in range(number_of_days)]
    # convert array to dataframe
    df2 = pd.DataFrame(date_list, columns=["date début  €"])
    df2["date début  €"] = pd.to_datetime(df2["date début  €"])
    
    # merge and forwardfill nans
    df2.merge(df, on="date début  €", how='left').fillna(method='ffill')
    

    output

    10  2021-04-11  40.86   8.46
    11  2021-04-12  40.86   8.46
    12  2021-04-13  40.86   8.46
    13  2021-04-14  40.86   8.46
    14  2021-04-15  40.86   8.46
    15  2021-04-16  33.69   8.46
    16  2021-04-17  33.69   8.46
    17  2021-04-18  33.69   8.46
    18  2021-04-19  33.69   8.46