Search code examples
pythonpandasdataframegroup-bymethod-chaining

How to use chain for several transformations, including creating new column?


Edited to comply with the rules:

How can I chain the following code? I cannot seem to be able to add new column with chain.

Input columns of Dataset:

ORDER, ITEM_SERIAL, DATE

%%time

df = Dataset.groupby(['ORDER', pd.Grouper(key='DATE', freq='7d')]).agg(Min=('DATE','min'), Max=('DATE','max'), ITEM_SERIAL_Min=('ITEM_SERIAL','first'), ITEM_SERIAL_Max=('ITEM_SERIAL','last')).reset_index()
                     
df['DIFFERENCE'] = df['Max'] - df['Min']
df = df[df['DIFFERENCE']!='0s'].sort_values('DIFFERENCE')
df = df.drop('DATE', axis=1)
df['ORDER'] = df['ORDER'] + '_' + df['Min'].map(str).str[:10]

df

Output columns of df:

ORDER, Min, Max, ITEM_SERIAL_Min, ITEM_SERIAL_Max, DIFFERENCE


Solution

  • The question is quite subjective, but you could chain all operations in a formatted way (and add comments!):

    df = (
        Dataset
        # groupby and aggregate ...
        .groupby(['ORDER', pd.Grouper(key='DATE', freq='7d')], as_index=False)
        .agg(
            Min=('DATE', 'min'),
            Max=('DATE', 'max'),
            ITEM_SERIAL_Min=('ITEM_SERIAL', 'first'),
            ITEM_SERIAL_Max=('ITEM_SERIAL', 'last'),
        )
        # compute DIFFERENCE ...
        .assign(DIFFERENCE=lambda d: d['Max'] - d['Min'])
        # filter ...
        .loc[lambda d: d['DIFFERENCE'] != '0s']
        # sort rows by ascending DIFFERENCE
        .sort_values('DIFFERENCE')
        # remove DATE column
        .drop('DATE', axis=1)
        # compute ORDER ...
        .assign(ORDER=lambda d: d['ORDER'] + '_' + d['Min'].astype(str).str[:10])
    )