Search code examples
pythonpandasdataframeappendsetvalue

How to append and set value in one command using Python?


I have the following dataframe (df):

    SERV_OR_IOR_ID   IMP_START_TIME IMP_CLR_TIME  IMP_START_TIME_BIN IMP_CLR_TIME_BIN
0     -1447310116       23:59:00     00:11:00                 47          0
1      1673545041       00:00:00     00:01:00                  0          0       
2      -743717696       23:59:00     00:00:00                 47          0
3       58641876        04:01:00     09:02:00                  8         18

I want to duplicate the rows for which IMP_START_TIME_BIN is less than IMP_CLR_TIME_BIN as many times as the absolute difference of IMP_START_TIME_BIN and IMP_CLR_TIME_BIN and then append (at the end of the data frame) or preferable append below that row while incrementing the value of IMP_START_TIME_BIN.

For example, for row 3, the difference is 10 and thus I should append 10 rows in the data frame incrementing the value in the IMP_START_TIME_BIN from 8(excluding) to 18(including).

The result should look like this:

    SERV_OR_IOR_ID   IMP_START_TIME IMP_CLR_TIME  IMP_START_TIME_BIN IMP_CLR_TIME_BIN
0     -1447310116       23:59:00     00:11:00                 47          0
1      1673545041       00:00:00     00:01:00                  0          0       
2      -743717696       23:59:00     00:00:00                 47          0
3       58641876        04:01:00     09:02:00                  8         18
4       58641876        04:01:00     09:02:00                  9         18
...      ...             ...          ...                     ...        ...
13      58641876        04:01:00     09:02:00                 18         18

For this I tried to do the following but it didn't work :

for i in range(len(df)): if df.ix[i,3] < df.ix[i,4]: for j in range(df.ix[i,3]+1, df.ix[i,4]+1): df = df.append((df.set_value(i,'IMP_START_TIME_BIN',j))*abs(df.ix[i,3] - df.ix[i,4]))

How can I do it ?


Solution

  • You can use this solution, only necessary index values has to be unique:

    #first filter only values for repeating
    l = df['IMP_CLR_TIME_BIN'] - df['IMP_START_TIME_BIN']
    l = l[l > 0] 
    print (l)
    3    10
    dtype: int64
    
    #repeat rows by repeating index values
    df1 = df.loc[np.repeat(l.index.values,l.values)].copy()
    
    #add counter to column IMP_START_TIME_BIN
    #better explanation http://stackoverflow.com/a/43518733/2901002
    a = pd.Series(df1.index == df1.index.to_series().shift())
    b = a.cumsum()
    a = b.sub(b.mask(a).ffill().fillna(0).astype(int)).add(1)
    df1['IMP_START_TIME_BIN'] = df1['IMP_START_TIME_BIN'] + a.values
    
    #append to original df, if necessary sort
    df = df.append(df1, ignore_index=True).sort_values('SERV_OR_IOR_ID')
    
    print (df)
        SERV_OR_IOR_ID IMP_START_TIME IMP_CLR_TIME  IMP_START_TIME_BIN  \
    0      -1447310116       23:59:00     00:11:00                  47   
    1       1673545041       00:00:00     00:01:00                   0   
    2       -743717696       23:59:00     00:00:00                  47   
    3         58641876       04:01:00     09:02:00                   8   
    4         58641876       04:01:00     09:02:00                   9   
    5         58641876       04:01:00     09:02:00                  10   
    6         58641876       04:01:00     09:02:00                  11   
    7         58641876       04:01:00     09:02:00                  12   
    8         58641876       04:01:00     09:02:00                  13   
    9         58641876       04:01:00     09:02:00                  14   
    10        58641876       04:01:00     09:02:00                  15   
    11        58641876       04:01:00     09:02:00                  16   
    12        58641876       04:01:00     09:02:00                  17   
    13        58641876       04:01:00     09:02:00                  18   
    
        IMP_CLR_TIME_BIN  
    0                  0  
    1                  0  
    2                  0  
    3                 18  
    4                 18  
    5                 18  
    6                 18  
    7                 18  
    8                 18  
    9                 18  
    10                18  
    11                18  
    12                18  
    13                18