I need merge two data frames, without having duplicate rows/duplicated data if there is more than one match. Basically, if the matching between my dataframes is ambiguous, I would like the ambiguous rows to NOT be matched, and each row to remain separate instead. I've been using the merge function, but it states that If there is more than one match, all possible matches contribute one row each, and I got to the same result using join/inner_join functions. Is there a way to fix this ?
Example:
df1
ID | Question 1 | Question 2 |
---|---|---|
A1 | 1 | 2 |
B1 | 3 | 4 |
C1 | 5 | 6 |
C1 | 7 | 8 |
df2
ID | Question 3 | Question 4 |
---|---|---|
A1 | a | b |
B1 | c | d |
C1 | e | f |
C1 | g | h |
What I get using merge by ID
ID | Question 1 | Question 2 | Question 3 | Question 4 |
---|---|---|---|---|
A1 | 1 | 2 | a | b |
B1 | 3 | 4 | c | d |
C1 | 5 | 6 | e | f |
C1 | 7 | 8 | g | h |
C1 | 5 | 6 | g | h |
C1 | 7 | 8 | e | f |
What I want
ID | Question 1 | Question 2 | Question 3 | Question 4 |
---|---|---|---|---|
A1 | 1 | 2 | a | b |
B1 | 3 | 4 | c | d |
C1 | 5 | 6 | NA | NA |
C1 | 7 | 8 | NA | NA |
C1 | NA | NA | e | f |
C1 | NA | NA | g | h |
Thank you for your help !
You can try to identify the duplicated
id's, exclude them when using merge
and rbind
them afterwards.
X <- unique(c(df1$ID[duplicated(df1$ID)], df2$ID[duplicated(df2$ID)]))
plyr::rbind.fill(merge(df1[!df1$ID %in% X,], df2[!df2$ID %in% X,]),
df1[df1$ID %in% X,],
df2[df2$ID %in% X,])
# ID Question1 Question2 Question3 Question4
#1 A1 1 2 a b
#2 B1 3 4 c d
#3 C1 5 6 <NA> <NA>
#4 C1 7 8 <NA> <NA>
#5 C1 NA NA e f
#6 C1 NA NA g h
Data
df2 <- read.table(header=TRUE, text="ID Question3 Question4
A1 a b
B1 c d
C1 e f
C1 g h")
df1 <- read.table(header=TRUE, text="
ID Question1 Question2
A1 1 2
B1 3 4
C1 5 6
C1 7 8")