I have a data frame consisting of three columns and the unique values for status are as follows "X" "0" "C" "1" "2" "3" "4" "5". In the beginning, I do not know how to group by each id and create several columns according to the conditions, for instance, a target column that is 1 if the status is 2, 3, 4, 5, and else is zero.
month_balance represents (The month of the extracted data is the starting point, backwards, 0 is the current month, -1 is the previous month, and so on)
status represents (0: 1-29 days past due, 1: 30-59 days past due, 2: 60-89 days overdue, 3: 90-119 days overdue, 4: 120-149 days overdue, 5: Overdue or bad debts write-offs for more than 150 days C: paid off that month, X: No loan for the month)
df <- data.frame (id = c("5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805"),
month_balance = c("0","-1","-2","-3","-4","-5","-6","-7","-8","-9","-10","-11","-12","-13","-14","-15","0","-1","-2","-3","-4","-5","-6","-7","-8","-9","-10","-11","-12","-13","-14"),
status = c("C","C","C","C","C","C","C","C","C","C","C","C","C","1","0","X","C","C","C","C","C","C","C","C","C","C","C","C","1","0","X")
)
In the end, I want to reach output as below:
df1 <- data.frame (id = c("5008804","5008805"),
month_begin = c("16","15"),
paid_off = c("13","12"),
num_of_pastdues = c("2","2"),
no_loan = c("1","1"),
target = c("0","0"))
A base R solution can be to create a custom function and apply it on each group, i.e.
MyFunction <- function(x){
month_begin = length(x)
paid_off = sum(x == 'C')
num_of_pastdues = sum(x %in% 0:5)
no_loan = sum(x == 'X')
target = ifelse(any(x %in% 2:5), 1, 0)
return(c(month_begin=month_begin, paid_off=paid_off, num_of_pastdues=num_of_pastdues, no_loan=no_loan, target=target))
}
res <- t(sapply(split(df$status, df$id), MyFunction))
month_begin paid_off num_of_pastdues no_loan target
# 5008804 16 13 2 1 0
# 5008805 15 12 2 1 0
To make it a data frame with the column id then,
res_df <- data.frame(res)
res_df$id <- rownames(res_df)
rownames(res_df) <- NULL
res_df
#month_begin paid_off num_of_pastdues no_loan target id
#1 16 13 2 1 0 5008804
#2 15 12 2 1 0 5008805