Search code examples
pythonpandasdataframeexplode

Create a pandas dataframe column of variable sized lists


I need to create a dataframe that contains all possible start times for a scheduler for some machines. My initial dataframe (msDF) contains three simple columns:

  • MachID - the ID of each machine
  • Start - the starting datetime that the machine is available for scheduling
  • slots - the number of slots available starting from that time

msDF is copied from a master dataframe, but for illustration, it may look like this:

msDF = pd.DataFrame({ 'MachID': [1,2,3,4,5],
                      'Start': ["02/04/2021 9:00","06/04/2021 12:30","09/04/2021 10:00", \
                                "12/04/2021 11:00","15/04/2021 08:00"],
                      'slots': [2, 3, 4, 3, 1]})
MachID Start slots
0 1 02/04/2021 9:00 2
1 2 06/04/2021 12:30 3
2 3 09/04/2021 10:00 4
3 1 12/04/2021 11:00 3
4 1 15/04/2021 08:00 1

I need to explode this dataframe so that each row is duplicated "slots" times with a slotIndex. The desired output is:

MachID Start slots SlotIndex
0 1 02/04/2021 9:00 2 0
0 1 02/04/2021 9:00 2 1
1 2 06/04/2021 12:30 3 0
1 2 06/04/2021 12:30 3 1
1 2 06/04/2021 12:30 3 2

My approach is problematic. I am creating variable length lists into the SlotIndex and exploding them, but this creates warnings.

To do this, I use:

msDF['SlotIndex'] = None
for x in msDF.index:
    msDF.SlotIndex.loc[x] = list(range(msDF.loc[x,'slots']))

It works but with warnings : SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

I later explode msDF to get the result I want:

msDF = msDF.explode('SlotIndex')

How can this be improved?


Solution

  • Use repeat.

    df.loc[df.index.repeat(df.slots)]
    

    The index will be repeated. So you can use that to set the slot id.

    df['slot_id'] = 1
    df['slot_id'] = df.groupby(df.index)['slot_id'].transform('cumsum')