I have 2 large dataframes (+-300k rows both). Dataframe A contains information about genes. I want to add information to this dataframe from dataframe B, if certain values overlap. If a row in df.B exists where df.B$X1 = df.A$X1 & df.B$X2 = df.A$X2, then I want the value of df.B$X3 to be added to a new column in df.A (df.A$X3), specifically at the right row. Basically I want to merge df.A and df.B based on column X1 and X2. I started off with for loops, but there must be an easier way... Hope I explained my question good enough.
Thanks!
> df.A
X1 X2
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
> df.B
X1 X2 X3
1 1 1 0.5
2 1 8 0.3
3 1 9 0.2
4 2 2 0.4
5 2 9 0.3
6 2 10 0.2
Desired outcome:
> df.A
X1 X2 X3
1 1 1 0.5
2 1 2 NA
3 1 3 NA
4 2 1 NA
5 2 2 0.4
6 2 3 NA
You want a left join
library(dplyr)
df.A %>% left_join(df.B, by = c("X1","X2"))
library(data.table)
merge(setDT(df.A), setDT(df.B), all.x = TRUE, by = c("X1","X2"))