I have a data frame like
ID YEAR_MONTH ATT_1 ATT_2
1 201301 Y 1
1 201302 Y 1
1 201302 N 0
1 201302 Y 0
1 201303 N 1
3 201301 N 1
3 201302 N 0
3 201302 Y 0
3 201302 Y 1
3 201303 Y 1
I want a final dataframe which will look like
ID YEAR_MONTH YEARMONTH_LAG1 ATT1_CHNG ATT2_CHNG
1 201301 NA NA NA
1 201302 201301 0 0
1 201303 201302 2 1
3 201301 NA NA NA
3 201302 201301 0 0
3 201303 201302 1 1
Note:
'YEARMONTH_LAG1 ' is the previous month corresponding to the current month. E.g., if YEAR_MONTH == 201301 then YEARMONTH_LAG1 = NA (as there is no record for 201212, and there is none in my data as all starts from 201201). Similarly if YEAR_MONTH == 201302 then YEARMONTH_LAG1 = 201301.
ATT1_CHNG is the number of times the level has changed (i.e., from Y to N and vice-verse) for 'ATT_1' in the previous month (i.e., in YEARMONTH_LAG1)
ATT2_CHNG is the number of times the level has changed (i.e., from 0 to 1 and vice-verse) for ATT_2 in the previous month
How can this be done in R?
In dplyr
:
require(dplyr)
df$ATT_1_New <- ifelse(df$ATT_1 == "Y", 1,0)
df %.%
group_by(ID, YEAR_MONTH) %.%
mutate(ATT_1_CHNG = sum(abs(diff(ATT_1_New))),
ATT_2_CHNG = sum(abs(diff(ATT_2)))) %.%
group_by(ID, add=FALSE) %.%
mutate(YEARMONTH_LAG1 = lag(YEAR_MONTH, 1),
ATT_1_CHNG = lag(ATT_1_CHNG,1),
ATT_2_CHNG = lag(ATT_2_CHNG,1)) %.%
group_by(ID, YEAR_MONTH, add = FALSE) %.%
summarize(YEARMONTH_LAG1 = YEARMONTH_LAG1[1],
ATT_1_CHNG = ATT_1_CHNG[1],
ATT_2_CHNG = ATT_2_CHNG[1])
# ID YEAR_MONTH YEARMONTH_LAG1 ATT_1_CHNG ATT_2_CHNG
#1 1 201301 NA NA NA
#2 1 201302 201301 0 0
#3 1 201303 201302 2 1
#4 3 201301 NA NA NA
#5 3 201302 201301 0 0
#6 3 201303 201302 1 1