Consider a base data frame as:
data <- data.frame(amount_bin = c("10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+", "10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+", "10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+"),
risk_score = c("0-700", "700-750", "750-800", "800-850", "850-900", "0-700", "700-750", "750-800", "800-850", "850-900", "0-700", "700-750", "750-800", "800-850", "850-900"))
and grouping information in another data frame as:
group_info <- data.frame(variable = c("amount_bin_group", "amount_bin_group", "amount_bin_group", "amount_bin_group", "amount_bin_group",
"risk_score_group", "risk_score_group", "risk_score_group", "risk_score_group", "risk_score_group"),
bin = c("10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+",
"0-700", "700-750", "750-800", "800-850", "850-900"),
group = c("1", "1", "2", "2", "3",
"a", "a", "a", "b", "b"))
I want to make 2 columns in base data frame (data) called "amount_bin_group" and "risk_score_group", which takes values from the column group_info$group when bin columns from group_info and data are the same. For simplicity, let's assume that the base column will always be the group_info$variable name minus the "group" string. That implies, when we want to create column amount_bin_group, base column will always be amount_bin in the base data frame.
The expected result data frame is:
final_data <- data.frame(amount_bin = c("10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+", "10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+", "10K-25K", "25K-35K", "35K-45K", "45K-50K", "50K+"),
risk_score = c("0-700", "700-750", "750-800", "800-850", "850-900", "0-700", "700-750", "750-800", "800-850", "850-900", "0-700", "700-750", "750-800", "800-850", "850-900"),
amount_bin_group = c("1", "1", "2", "2", "3", "1", "1", "2", "2", "3", "1", "1", "2", "2", "3"),
risk_score_group = c("a", "a", "a", "b", "b", "a", "a", "a", "b", "b", "a", "a", "a", "b", "b"))
A solution that I just thought is iteratively merge the data frames i.e. :
final_data <- merge(data, group_info[, c("bin", "group")], by.x = "amount_bin", by.y = "bin")
final_data$amount_bin_group <- final_data$group
final_data$group <- NULL
But, I am sure there can be a more efficient solution. Please note that there are multiple such columns and not just two. So, maybe a loop would help.
You could just use a for
loop to keep merging on the different sets:
for (i in unique(group_info$variable)) {
data <- merge(
data, group_info[group_info$variable==i,c("bin","group")],
by.x=sub("_group","",i), by.y="bin"
)
names(data)[names(data)=="group"] <- i
}