Search code examples
rpandascumulative-sumderived-column

Creating Sequence Sum using Pandas and R


enter image description here

Hi All i want to calculate sequence and Flag using ID,Month and Value. for every unique id if value changes to zero then sequence is 1 and if it continues to be zero for succesive months then sequence will add as shown above.

Flag will be 1 when the sequence add to 6.

Please help i want to do it using Pandas and R


Solution

  • In R, we can create the 'Sequence' and 'Flag' using data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'Id', we specify the 'i' with Value == 0, create the 'Sequence' as the sequence (1:.N) based on the TRUE values in 'i'. Then, create the 'Flag' by assigning (:=) 1 for those elements that are 1 for the 'Sequence'. If there are 'Id' that have no 1 or all the 'Flag' values are NA, then concatenate 0 at the top while removing one element from 'Flag' (Flag[-1]) or else return the 'Flag'

    library(data.table)
    setDT(df1)[Value == 0, Sequence := 1:.N , by = Id]
    df1[Sequence ==1, Flag := 1][, Flag := if(all(is.na(Flag))) 
                         c(0, Flag[-1]) else Flag, by = Id]
    df1
    #        Id  Month Value Sequence Flag
    # 1: SCSR1 Jan-16   400       NA   NA
    # 2: SCSR1 Feb-16     0        1    1
    # 3: SCSR1 Mar-16     0        2   NA
    # 4: SCSR1 Apr-16     0        3   NA
    # 5: SCSR1 May-16     0        4   NA
    # 6: SCSR1 Jun-16     0        5   NA
    # 7: SCSR1 Jul-16     0        6   NA
    # 8: SCCS9 Jan-16   440       NA    0
    # 9: SCCS9 Feb-16  3000       NA   NA
    #10: SCCS9 Mar-16   400       NA   NA
    #11: SCCS9 Apr-16   100       NA   NA
    #12: SCCS9 May-16   300       NA   NA
    #13: SCCS9 Jun-16   400       NA   NA
    #14: SCCS9 Jul-16   100       NA   NA
    #15: SKHH1 Jan-16  1000       NA   NA
    #16: SKHH1 Feb-16     0        1    1
    #17: SKHH1 Mar-16     0        2   NA
    #18: SKHH1 Apr-16     0        3   NA
    #19: SKHH1 May-16     0        4   NA
    #20: SKHH1 Jun-16     0        5   NA
    #21: SKHH1 Jul-16     0        6   NA
    

    NOTE: It is better to have NA as missing values than blanks ("") to preserve the class of the column.

    data

    df1 <- data.frame(Id = rep(c("SCSR1", "SCCS9", "SKHH1"), each = 7),
        Month = rep(c('Jan-16', 'Feb-16', 'Mar-16', 'Apr-16', 'May-16', 'Jun-16',
       'Jul-16'), 3), Value = c(400, rep(0, 6), 440, 3000, 400, 100, 300, 400,
       100, 1000, rep(0,6)))