I want to replace values of a variable.
This is my dB :
head(mydb)
ID V1 V2 V3 V4
1 value value 0 0
2 value value 0 0
3 value value 0 0
4 value value 0 0
5 value value 0 0
6 value value 0 0
V3
and V4
have 0
for all observations.
Then I created several dBs like below:
head(newdb)
ID V3 V4
2 5 4
4 8 5
6 9 6
I want to obtain something like this :
ID V1 V2 V3 V4
1 value value 0 0
2 value value 5 4
3 value value 0 0
4 value value 8 5
5 value value 0 0
6 value value 9 6
I tried with sqldf to do it like this :
mydf <- sqldf('SELECT mydf.*, newdb.v3, newdb.v4
FROM mydf
LEFT JOIN newdb
ON mydf.ID = newdb.id')
The code I created works well and do its job; but the problem is that I'm inside a for loop, and for each of my newDB
the sql
code generates others V3
and V4
and attach it close the previous one creating something like this :
ID V1 V2 V3 V4 V3 V4 V3 V4
1 value value 0 0 1 5 0 0
2 value value 5 4 0 0 0 0
3 value value 0 0 0 0 7 8
4 value value 8 5 0 0 0 0
5 value value 0 0 2 2 0 0
6 value value 9 6 0 0 0 0
I've added another V3 and V4 columns for each iterations of my loop (in this case 3 iterations).
How can I avoid this problem?
You can simply match the IDs in two dataframes and replace them in the original one:
mydf[,c("V3","V4")] <- newdf[match(mydf$ID, newdf$ID),c("V3","V4")]
mydf
# ID V1 V2 V3 V4
# 1 1 value value NA NA
# 2 2 value value 5 4
# 3 3 value value NA NA
# 4 4 value value 8 5
# 5 5 value value NA NA
# 6 6 value value 9 6
Later you can replace NA
s with 0
s.
Update:
Instead of doing a for-loop
concatenate all the newdf
s together and then run the code on that; look at the pseudo-code below:
newdf_concat <- rbind(newdf1, newdf2)
mydf[,c("V3","V4")] <- newdf_concat[match(mydf$ID, newdf_concat$ID),c("V3","V4")]