Search code examples
pythonpandasdataframepandas-groupby

check if cumsum of the column is greater than range value than increment the element in list


I have a list

sample_dates = ["10/07/2021","11/07/2021","12/07/2021","13/07/2021",
                "14/07/2021","15/07/2021","16/07/2021","17/07/2021",
                "18/07/2021","19/07/2021","20/07/2021","21/07/2021",
                "22/07/2021","23/07/2021","24/07/2021"]

and dataframe like below

Truckid   Tripid   kms    
  1          1     700.3  
  1          1     608.9        
  1          1     400.2  
  1          2     100.2  
  1          2     140.8        
  1          3     1580.0 
  1          3     357.3        
  1          3     541.5  
  1          4     421.2   
  1          4     1694.4 
  1          4     1585.9 
  1          5     173.3  
  1          5     237.4   
  1          5     83.3   
  2          1     846.1  
  2          1     1167.6  
  2          2     388.8  
  2          2     70.5   
  2          2     127.1  
  2          3     126.7  
  2          3     262.4  

I want Date column by cumsum,if kms > 0 & < 2000 should have same date,if it increase 2000 than change the date, and than if it is > 2000 & < 3000 than do not change and than if its passes 3000 than again change the date. and so on

also if tripid changes than restart the counting from 0.

I want something like this

Truckid   Tripid   kms        Date
  1          1     700.3      10/07/2021
  1          1     608.9      10/07/2021      
  1          1     400.2      10/07/2021
  1          2     100.2      11/07/2021
  1          2     140.8      11/07/2021      
  1          3     1580.0     12/07/2021
  1          3     357.3      12/07/2021      
  1          3     541.5      13/07/2021
  1          4     421.2      14/07/2021 
  1          4     1694.4     15/07/2021
  1          4     1585.9     16/07/2021
  1          5     173.3      17/07/2021
  1          5     237.4      17/07/2021 
  1          5     83.3       17/07/2021
  2          1     846.1      18/07/2021
  2          1     1167.6     19/07/2021 
  2          2     388.8      20/07/2021
  2          2     70.5       20/07/2021
  2          2     127.1      20/07/2021
  2          3     126.7      21/07/2021
  2          3     262.4      21/07/2021

Solution

  • You can compute the cumsum per group and either cut is manually or use a mathematical trick to make groups.

    Then map your dates:

    # round to thousands, clip to get min 1000 km
    kms = df.groupby(['Truckid', 'Tripid'])['kms'].cumsum().floordiv(1000).clip(1)
    
    # OR use manual bins
    kms = pd.cut(df.groupby(['Truckid', 'Tripid'])['kms'].cumsum(),
                 bins=[0,2000,3000,4000]) # etc. up to max wanted value
    
    
    df['Date'] = (df
                  .groupby(['Truckid', 'Tripid', kms]).ngroup() # get group ID
                  .map(dict(enumerate(sample_dates)))      # match to items in order
                 )
    

    alternative to use consecutive days from the starting point:

    df['Date'] = pd.to_datetime(df.groupby(['Truckid', 'Tripid', kms]).ngroup(),
                                unit='d', origin='2021-07-10')
    

    output:

        Truckid  Tripid     kms        Date
    0         1       1   700.3  10/07/2021
    1         1       1   608.9  10/07/2021
    2         1       1   400.2  10/07/2021
    3         1       2   100.2  11/07/2021
    4         1       2   140.8  11/07/2021
    5         1       3  1580.0  12/07/2021
    6         1       3   357.3  12/07/2021
    7         1       3   541.5  13/07/2021
    8         1       4   421.2  14/07/2021
    9         1       4  1694.4  15/07/2021
    10        1       4  1585.9  16/07/2021
    11        1       5   173.3  17/07/2021
    12        1       5   237.4  17/07/2021
    13        1       5    83.3  17/07/2021
    14        2       1   846.1  18/07/2021
    15        2       1  1167.6  19/07/2021
    16        2       2   388.8  20/07/2021
    17        2       2    70.5  20/07/2021
    18        2       2   127.1  20/07/2021
    19        2       3   126.7  21/07/2021
    20        2       3   262.4  21/07/2021