Search code examples
pythonpandasgroup-byffill

How to forward fill a subset data to have a continuous monthly data


I need to copy data as a forward fill from Jan 2023 to Feb 2023 and March 2023 for SKU1, Location1 since data submissions are missing. For this: 1. need to create a continuous months for each sku, location combination 2. ffill for NA

expected result is shown in below image, the blue highlights are carried forward ensuring there is continuous monthly data

enter image description here

Thanks in Advance!

please refer to the below input data frame,

import pandas as pd

# Sample data
data = {
    'submissionmonth': ['1/1/2023', '1/1/2023', '1/1/2023', '1/1/2023', '1/1/2023', '1/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '10/1/2023', '10/1/2023', '10/1/2023', '10/1/2023', '10/1/2023', '10/1/2023', '10/1/2023', '10/1/2023'],
    'sku': ['SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2'],
    'location': ['Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2'],
    'forecastmonth': ['1/1/2023', '2/1/2023', '3/1/2023', '4/1/2023', '5/1/2023', '6/1/2023', '4/1/2023', '5/1/2023', '6/1/2023', '7/1/2023', '8/1/2023', '9/1/2023', '4/1/2023', '5/1/2023', '6/1/2023', '7/1/2023', '8/1/2023', '9/1/2023', '11/1/2023', '12/1/2023', '1/1/2024', '2/1/2024', '3/1/2024', '4/1/2024', '4/1/2024', '4/1/2024'],
    'cost': [100, 100, 100, 100, 100, 100, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]
}

# Create DataFrame
df = pd.DataFrame(data)

# Convert to datetime
df['submissionmonth'] = pd.to_datetime(df['submissionmonth'], format='%m/%d/%Y')
df['forecastmonth'] = pd.to_datetime(df['forecastmonth'], format='%m/%d/%Y')

print(df.shape)
print(df)

i tried grouping and ffill, resulted in shape error


Solution

  • Looks like you need to deduplicate the rows with groupby.cumcount, then use groupby.resample:

    out = (df
       # there are several "forecastmonth" per combination of date/sku/location
       # we first need to deduplicate them unambiguously
       .assign(n=lambda x: x.groupby(['submissionmonth', 'sku', 'location'])
                            .cumcount())
       .set_index('submissionmonth')
       .groupby(['n', 'sku', 'location'], group_keys=False)
       .resample('MS').ffill()
       .reset_index()[df.columns]
       # below is optional
       .sort_values(by=['submissionmonth', 'sku', 'location', 'forecastmonth'],
                    kind='stable', ignore_index=True)
    )
    

    Output:

       submissionmonth   sku   location forecastmonth  cost
    0       2023-01-01  SKU1  Location1    2023-01-01   100
    1       2023-01-01  SKU1  Location1    2023-02-01   100
    2       2023-01-01  SKU1  Location1    2023-03-01   100
    3       2023-01-01  SKU1  Location1    2023-04-01   100
    4       2023-01-01  SKU1  Location1    2023-05-01   100
    5       2023-01-01  SKU1  Location1    2023-06-01   100
    6       2023-02-01  SKU1  Location1    2023-01-01   100
    7       2023-02-01  SKU1  Location1    2023-02-01   100
    8       2023-02-01  SKU1  Location1    2023-03-01   100
    9       2023-02-01  SKU1  Location1    2023-04-01   100
    10      2023-02-01  SKU1  Location1    2023-05-01   100
    11      2023-02-01  SKU1  Location1    2023-06-01   100
    12      2023-03-01  SKU1  Location1    2023-01-01   100
    13      2023-03-01  SKU1  Location1    2023-02-01   100
    14      2023-03-01  SKU1  Location1    2023-03-01   100
    15      2023-03-01  SKU1  Location1    2023-04-01   100
    16      2023-03-01  SKU1  Location1    2023-05-01   100
    17      2023-03-01  SKU1  Location1    2023-06-01   100
    18      2023-04-01  SKU1  Location1    2023-04-01   200
    19      2023-04-01  SKU1  Location1    2023-05-01   200
    20      2023-04-01  SKU1  Location1    2023-06-01   200
    21      2023-04-01  SKU1  Location1    2023-07-01   200
    22      2023-04-01  SKU1  Location1    2023-08-01   200
    23      2023-04-01  SKU1  Location1    2023-09-01   200
    24      2023-04-01  SKU2  Location2    2023-04-01   200
    25      2023-04-01  SKU2  Location2    2023-05-01   200
    26      2023-04-01  SKU2  Location2    2023-06-01   200
    27      2023-04-01  SKU2  Location2    2023-07-01   200
    28      2023-04-01  SKU2  Location2    2023-08-01   200
    29      2023-04-01  SKU2  Location2    2023-09-01   200
    30      2023-05-01  SKU2  Location2    2023-04-01   200
    31      2023-05-01  SKU2  Location2    2023-05-01   200
    32      2023-05-01  SKU2  Location2    2023-06-01   200
    33      2023-05-01  SKU2  Location2    2023-07-01   200
    34      2023-05-01  SKU2  Location2    2023-08-01   200
    35      2023-05-01  SKU2  Location2    2023-09-01   200
    36      2023-06-01  SKU2  Location2    2023-04-01   200
    37      2023-06-01  SKU2  Location2    2023-05-01   200
    38      2023-06-01  SKU2  Location2    2023-06-01   200
    39      2023-06-01  SKU2  Location2    2023-07-01   200
    40      2023-06-01  SKU2  Location2    2023-08-01   200
    41      2023-06-01  SKU2  Location2    2023-09-01   200
    42      2023-07-01  SKU2  Location2    2023-04-01   200
    43      2023-07-01  SKU2  Location2    2023-05-01   200
    44      2023-07-01  SKU2  Location2    2023-06-01   200
    45      2023-07-01  SKU2  Location2    2023-07-01   200
    46      2023-07-01  SKU2  Location2    2023-08-01   200
    47      2023-07-01  SKU2  Location2    2023-09-01   200
    48      2023-08-01  SKU2  Location2    2023-04-01   200
    49      2023-08-01  SKU2  Location2    2023-05-01   200
    50      2023-08-01  SKU2  Location2    2023-06-01   200
    51      2023-08-01  SKU2  Location2    2023-07-01   200
    52      2023-08-01  SKU2  Location2    2023-08-01   200
    53      2023-08-01  SKU2  Location2    2023-09-01   200
    54      2023-09-01  SKU2  Location2    2023-04-01   200
    55      2023-09-01  SKU2  Location2    2023-05-01   200
    56      2023-09-01  SKU2  Location2    2023-06-01   200
    57      2023-09-01  SKU2  Location2    2023-07-01   200
    58      2023-09-01  SKU2  Location2    2023-08-01   200
    59      2023-09-01  SKU2  Location2    2023-09-01   200
    60      2023-10-01  SKU2  Location2    2023-11-01   200
    61      2023-10-01  SKU2  Location2    2023-12-01   200
    62      2023-10-01  SKU2  Location2    2024-01-01   200
    63      2023-10-01  SKU2  Location2    2024-02-01   200
    64      2023-10-01  SKU2  Location2    2024-03-01   200
    65      2023-10-01  SKU2  Location2    2024-04-01   200
    66      2023-10-01  SKU2  Location2    2024-04-01   200
    67      2023-10-01  SKU2  Location2    2024-04-01   200