Search code examples
rdataframelag

Counting the number of level changes of an attribute


I have a data frame like

ID  YEAR_MONTH  ATT_1   ATT_2
1   201301        Y     1
1   201302        Y     1
1   201302        N     0
1   201302        Y     0
1   201303        N     1
3   201301        N     1
3   201302        N     0
3   201302        Y     0
3   201302        Y     1
3   201303        Y     1

I want a final dataframe which will look like

ID  YEAR_MONTH  YEARMONTH_LAG1  ATT1_CHNG   ATT2_CHNG
1   201301           NA           NA          NA
1   201302           201301       0           0
1   201303           201302       2           1
3   201301           NA           NA          NA
3   201302           201301       0           0
3   201303           201302       1           1

Note:

  1. 'YEARMONTH_LAG1 ' is the previous month corresponding to the current month. E.g., if YEAR_MONTH == 201301 then YEARMONTH_LAG1 = NA (as there is no record for 201212, and there is none in my data as all starts from 201201). Similarly if YEAR_MONTH == 201302 then YEARMONTH_LAG1 = 201301.

  2. ATT1_CHNG is the number of times the level has changed (i.e., from Y to N and vice-verse) for 'ATT_1' in the previous month (i.e., in YEARMONTH_LAG1)

  3. ATT2_CHNG is the number of times the level has changed (i.e., from 0 to 1 and vice-verse) for ATT_2 in the previous month

How can this be done in R?


Solution

  • In dplyr:

    require(dplyr)
    
    df$ATT_1_New <- ifelse(df$ATT_1 == "Y", 1,0)
    
    df %.% 
      group_by(ID, YEAR_MONTH) %.% 
      mutate(ATT_1_CHNG = sum(abs(diff(ATT_1_New))),
             ATT_2_CHNG = sum(abs(diff(ATT_2)))) %.%
      group_by(ID, add=FALSE) %.%
      mutate(YEARMONTH_LAG1 = lag(YEAR_MONTH, 1),
             ATT_1_CHNG = lag(ATT_1_CHNG,1),
             ATT_2_CHNG = lag(ATT_2_CHNG,1)) %.%
      group_by(ID, YEAR_MONTH, add = FALSE) %.%
      summarize(YEARMONTH_LAG1 = YEARMONTH_LAG1[1],
                ATT_1_CHNG =  ATT_1_CHNG[1], 
                ATT_2_CHNG =  ATT_2_CHNG[1])
    
    
    #  ID YEAR_MONTH YEARMONTH_LAG1 ATT_1_CHNG ATT_2_CHNG
    #1  1     201301             NA         NA         NA
    #2  1     201302         201301          0          0
    #3  1     201303         201302          2          1
    #4  3     201301             NA         NA         NA
    #5  3     201302         201301          0          0
    #6  3     201303         201302          1          1