Editing the order and content of schedule file in Pandas, Python based on condition

I'm trying to edit a schedule file in pandas, python 3 and am very stuck at the moment.

Basically, I have a schedule file like this:

id    trip_id    origin    destination    courier_status     package_origin   package_destination
1        1         A           B              False                nan               nan
1        2         B           C               True                 X                 Y
2        1         F           G              False                nan               nan
2        2         G           H               True                 Q                 R
2        3         H           I              False                nan               nan

If the courier_status is true, I want them (the person in id) to make a detour to the package_origin and package_destination before continuing to destination, thus changing their schedule file. Ideally, the new schedule file is supposed to be like this, newSchedule:

id    trip_id    origin    destination   status
1        1         A           B         normal
1        2         B           X         courier
1        3         X           Y         courier
1        4         Y           C         normal      
2        1         F           G         normal
2        2         G           Q         courier
2        3         Q           R         courier
2        4         R           H         normal
2        5         H           I         normal

My idea was to make a new df, consisting of only additional trips, then append them to the existing schedule, then remove the duplicates and keep='last', afterwards apply the sort_values on id. However, I haven't been able to make the newSchedule DataFrame. Can anybody give me a hand or give me a direction on what kind of algorithm should I use? I was thinking of using a loop or using np.where?

The real data have many more columns and row, I just want to know how can I work with this. I am a rookie on working with python so I'm very lost at the moment.

Please help!

Solution

Here's one option. First, you can split up the DataFrame based on the courier_status column. Many ways to do this, here I use a groupby:

(_, df_n), (_, df_c) = df.groupby('courier_status')

The normal DataFrame is easy to deal with, just drop some columns and assign the status:

df_n['status'] = 'normal'
df_n = df_n.drop(columns=['courier_status', 'package_origin', 'package_destination', 'trip_id'])

The courier DataFrame takes a lot more work. Here we'll need to form the chain from ['origin', 'package_origin', 'package_destination', 'destination'], which can be accomplished by specifying that order, stacking and joining together a shifted version. There's some cleaning up of things I threw in the Index that we needed to keep. Finally assign everything but the last 'package_origin' -> 'package_destination' part a status of 'courier'.

s = (df_c.set_index(['id'], append=True)
        [['origin', 'package_origin', 'package_destination', 'destination']].stack()
    )

df_c = (pd.concat([s.rename('origin'), s.groupby(level=0).shift(-1).rename('destination')], axis=1)
          .dropna()
          .reset_index(['id'])
          .reset_index(-1, drop=True)
          .assign(status='courier'))

df_c.loc[~df_c.index.duplicated(keep='last'), 'status'] = 'normal'

Finally, because we kept the original index all of this time, we can concat the two together and then sort_index to put the rows in the order they should appear, and define the 'trip_id' using a groupby + cumcount:

result = pd.concat([df_n, df_c]).sort_index()
result['trip_id'] = result.groupby('id').cumcount()+1

#   id origin destination   status  trip_id
#0   1      A           B   normal        1
#1   1      B           X  courier        2
#1   1      X           Y  courier        3
#1   1      Y           C   normal        4
#2   2      F           G   normal        1
#3   2      G           Q  courier        2
#3   2      Q           R  courier        3
#3   2      R           H   normal        4
#4   2      H           I   normal        5