Search code examples
for-loopwhile-loopappendrow

Generate rows based to make a sequence in a column of a dataframe


I'm trying to generate new rows based on values in a certain column. In current data as you can see 'days_left' column does not have all sequential values.

current = {'assignment': [1,1,1,1,2,2,2,2,2], 'days_left': [1, 2, 5, 9,1, 3, 4, 8, 13]}
dfcurrent = pd.DataFrame(data=current)
dfcurrent

While I want to generate rows into that dataframe to create make sequential list for for 'days_left' for each 'assignment'. Please see the desidered output below:

   desired = {'assignment': [1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2],
           'days_left': [1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,10,11,12,13]}
dfdesired = pd.DataFrame(data=desired)
dfdesired

Note: The original data is much bigger and has other columns as well but I just simplified it for this question.

Could you please help me how I can solve this?

Thank you very much in advance!


Solution

  • You can iterate through the rows of the current dataframe and create a new dataframe. For each days_left range, copy the current row to the new dataframe and update the days_left column value.

    Try this code:

    import pandas as pd
    
    current = {'assignment': [1,1,1,1,2,2,2,2,2], 'days_left': [1, 2, 5, 9, 1, 3, 4, 8, 13]}
    dfc = pd.DataFrame(data=current)
    
    dfd = pd.DataFrame()  # new dataframe
    
    for r in range(1,len(dfc)):  # start at 2nd row
       for i in range(dfc.iloc[r-1]['days_left'],dfc.iloc[r]['days_left']): # fill gap of missing numbers
          dfd = dfd.append(dfc.iloc[r]) # copy row
          dfd.reset_index(drop=True, inplace=True)  # prevent index duplication
          dfd.loc[len(dfd)-1, 'days_left'] = i  # update column value
       if r == len(dfc)-1 or dfc.iloc[r+1]['assignment']!=dfc.iloc[r]['assignment']:  # last entry in assignment
          dfd = dfd.append(dfc.iloc[r]) # copy row
          dfd.reset_index(drop=True, inplace=True)  # prevent index duplication
    
    dfd = dfd.astype(int)  # convert all data to integers
    print(dfd.to_string(index=False))
    

    Output

     assignment  days_left
              1          1
              1          2
              1          3
              1          4
              1          5
              1          6
              1          7
              1          8
              1          9
              2          1
              2          2
              2          3
              2          4
              2          5
              2          6
              2          7
              2          8
              2          9
              2         10
              2         11
              2         12
              2         13