Search code examples
pythonpandasdataframenumpygroup-by

Add dataframe rows based on external condition


I have this dataframe:

Env location lob      grid row server        model        make          slot
Prod USA     Market   AB3 bc2  Server123     Hitachi        dcs           1
Prod USA     Market   AB3 bc2  Server123     Hitachi        dcs           2
Prod USA     Market   AB3 bc2  Server123     Hitachi        dcs           3
Prod USA     Market   AB3 bc2  Server123     Hitachi       dcs           4
Dev  EMEA    Ins     AB6 bc4  Serverabc     IBM            abc           3
Dev  EMEA    Ins     AB6 bc4  Serverabc     IBM            abc          3
Dev  EMEA    Ins     AB6 bc4  Serverabc     IBM            abc           3
Dev  EMEA    Ins     AB6 bc4  Serverabc     IBM            abc           4
Dev  EMEA    Ins     AB6 bc4  Serverabc     IBM            abc           5
Dev  EMEA    Ins     AB6 bc4  Serverabc     IBM            abc           5
Dev  EMEA    Ins     AB6 bc4  Serverabc     IBM            abc           6
UAT  PAC     Retail   AB6 bc4  Serverzzz     Cisco          ust           3
UAT  PAC     Retail   BB6 bc4  Serverzzz     Cisco          ust           4
UAT  PAC     Retail   BB6 bc4  Serverzzz     Cisco          ust           5
UAT  PAC     Retail   BB6 bc4  Serverzzz     Cisco          ust           6

In this example:

  • If model is IBM, there must be 8 slots; because the slot starts from slot=3, so it must go from 3 to 10. In this case, only slots 3 to 6 are present.
    • Therefore, I need to add 4 more rows (slot 7, 8, 9, 10).
  • If model is Cisco, row count for cisco needs to be 6. Only slots 3 to 6 are present.
    • Therefore, I need to add 2 more rows

New rows:

  • must repeat the last row for the model, while incrementing the slot number
  • Their "grid" cell must indicate "available".

This needs to be done programmatically where given the model, I need to know the total number of slots and if the number of slots is short, I need to create new rows.

The final dataframe needs to be like this:

Env location lob    grid      row    server        model       make        slot
Prod USA     Market AB3       bc2  Server123     Hitachi        dcs           1
Prod USA     Market AB3       bc2  Server123     Hitachi        dcs           2
Prod USA     Market AB3       bc2  Server123     Hitachi        dcs           3
Prod USA     Market AB3       bc2  Server123     Hitachi.       dcs           4
Dev  EMEA    Ins.   AB6       bc4  Serverabc     IBM            abc           3
Dev  EMEA    Ins.   AB6       bc4  Serverabc     IBM            abc           4
Dev  EMEA    Ins.   AB6       bc4  Serverabc     IBM            abc           5
Dev  EMEA    Ins.   AB6       bc4  Serverabc     IBM            abc           6
Dev  EMEA    Ins.   available bc4  Serverabc     IBM            abc           7
Dev  EMEA    Ins.   available bc4  Serverabc     IBM            abc           8
Dev  EMEA    Ins.   available bc4  Serverabc     IBM            abc           9
Dev  EMEA    Ins.   available bc4  Serverabc     IBM            abc           10
UAT  PAC     Retail   AB6     bc4  Serverzzz     Cisco          ust           3
UAT  PAC     Retail   BB6     bc4  Serverzzz     Cisco          ust           4
UAT  PAC     Retail   BB6     bc4  Serverzzz     Cisco          ust           5
UAT  PAC     Retail   BB6     bc4  Serverzzz     Cisco          ust           6
UAT  PAC     Retail  available bc4  Serverzzz     Cisco          ust          7
UAT  PAC     Retail  available bc4  Serverzzz     Cisco          ust          8

I tried something like this:

def slots(row):
   if 'IBM' in row['model']:
      number_row=8
   if 'Cisco' in row['model']:
      number_row=6

How do I do this?


Solution

  • I would use groupby.apply to add new rows to each model. The general workflow is as follows.

    1. Remove duplicate slots from each model.

    2. Group the dataframe by the 'model' column.

    3. For each model, do anything at all only if

      a. it is either IBM or Cisco (identified by whether it is a key in N_slots dictionary)

      b. the number of slots does not reach the required number of slots (the values in N_slots dictionary)

    4. If (3) is satisfied, then use reindex() to add new empty rows.

    5. Assign slot values to the 'slots' column (e.g. For IBM, it's 3-10)

    6. Fill the newly created empty rows in the grid column by 'available'.

    7. Fill all other newly created rows by values one row above (ffill()).

    8. Reset index to remove duplicate indices.

    def add_slots(s):
        # get model name
        model = s['model'].iat[0]
        # get how many slots there should be for this model
        slots = N_slots.get(model, 0)
        # where to start reindexing
        start = s.index[0]
        low = s['slot'].dropna().astype(int).min()
        # for pandas>=1.1, remove the previous line and uncomment the next line
        # low = s['slot'].min()
    
        if len(s) < slots:
            # add new indices
            s = s.reindex(range(start, start + slots))
            # assign slots
            s['slot'] = range(low, low + slots)
            # assign grids at newly created slots to be 'available'
            s['grid'] = s['grid'].fillna('available')
            
        return s
    
    
    
    N_slots = {'IBM': 8, 'Cisco': 6}
    
    new_df = (
        df.drop_duplicates(['server', 'model', 'slot'], ignore_index=True)          # remove duplicate slots
        .groupby('model', sort=False, group_keys=False).apply(add_slots)  # add new slots
        .ffill()                                                          # fill rest of the columns
        .sort_values(by=['server', 'slot'])
        .reset_index(drop=True)                                           # reset index
    )
    
         Env location     lob       grid  row     server    model make  slot
    0   Prod      USA  Market        AB3  bc2  Server123  Hitachi  dcs     1
    1   Prod      USA  Market        AB3  bc2  Server123  Hitachi  dcs     2
    2   Prod      USA  Market        AB3  bc2  Server123  Hitachi  dcs     3
    3   Prod      USA  Market        AB3  bc2  Server123  Hitachi  dcs     4
    4    Dev     EMEA     Ins        AB6  bc4  Serverabc      IBM  abc     3
    5    Dev     EMEA     Ins        AB6  bc4  Serverabc      IBM  abc     4
    6    Dev     EMEA     Ins        AB6  bc4  Serverabc      IBM  abc     5
    7    Dev     EMEA     Ins        AB6  bc4  Serverabc      IBM  abc     6
    8    Dev     EMEA     Ins  available  bc4  Serverabc      IBM  abc     7
    9    Dev     EMEA     Ins  available  bc4  Serverabc      IBM  abc     8
    10   Dev     EMEA     Ins  available  bc4  Serverabc      IBM  abc     9
    11   Dev     EMEA     Ins  available  bc4  Serverabc      IBM  abc    10
    12   UAT      PAC  Retail        AB6  bc4  Serverzzz    Cisco  ust     3
    13   UAT      PAC  Retail        BB6  bc4  Serverzzz    Cisco  ust     4
    14   UAT      PAC  Retail        BB6  bc4  Serverzzz    Cisco  ust     5
    15   UAT      PAC  Retail        BB6  bc4  Serverzzz    Cisco  ust     6
    16   UAT      PAC  Retail  available  bc4  Serverzzz    Cisco  ust     7
    17   UAT      PAC  Retail  available  bc4  Serverzzz    Cisco  ust     8