I have 2 data frames, and I'm trying to do an inner join using a condition. I'll show an example to clarify what I'm trying to do:
A
cnumero cep numero bairro
1 124,35 124 35 K
2 375,NA 375 NA L
3 NA,28 NA 28 M
B
cnumero bairro XY
1 124,35 J 1
2 375,48 L 2
3 135,28 M 3
The first condition to merge is, if A$cep != NA
and A$numero != NA
, then the join is made using the column cnumero
, otherwise the join is made using the column bairro
, resulting in:
new_A
cnumero cep numero XY
1 124,35 124 35 1
2 375,NA 375 NA 2
3 NA,28 NA 28 3
What I have done so far is to make the inner join using this:
A[A$cnumero %in% unique(B$cnumero),]
, since in my real dataframe I have repeated values in data frame B
.
Edit: my data example
A = data.frame(cnumero=c("124,35", "375,NA", "NA,28"),cep = c(124, 375, NA), numero = c(35, NA, 28), bairro = c("K", "L","M"))
B = data.frame(cnumero=c("124,35", "375,48", "135,28"), bairro = c("J", "L","M"), XY = c(1, 2, 3))
new_A = data.frame(cnumero=c("124,35", "375,NA", "NA,28"),cep = c(124, 375, NA), numero = c(35, NA, 28), XY = c(1, 2, 3))
What about a solution in base R in this way, doing two steps, first the join for the first condition, then for the second, last put results together:
# the join with the first condition
A_1 <- merge(A[!is.na(A$cep) & !grepl('NA',A$cnumero), ],B, by = 'cnumero')
# select the column you need
A_1 <- A_1[,c("cnumero", "cep","numero","XY")]
# join for the second condition
A_2 <- merge(A[is.na(A$cep) | grepl('NA',A$cnumero), ],B, by = 'bairro')
# select columns you need
A_2 <- A_2[,c("cnumero.x", "cep","numero","XY")]
# rename the second part's columns
colnames(A_2) <- colnames(A_1)
# now the result
new_A <- rbind(A_1,A_2)
new_A
cnumero cep numero XY
1 124,35 124 35 1
2 375,NA 375 NA 2
3 NA,28 NA 28 3
# in case you need to remove the "temporary" tables
# remove(A_1, A_2)