I dislike (or don't understand) the match()
function because it outputs NA
if it doesn't find a value and because it returns only the first occurrence of a match. Both is bad for subsetting one dataframe by another, at least in my situation.
Let the data be
df1 <- data.frame(id=c(1:3,3:4), val= 11:15)
df2 <- data.frame(id= c(2,0,3,1), val=91:94)
I want to subset df1
by the id's of df2
. So my desired output is
df_expected <- data.frame(id= c(2,3,3,1), val= c(12,13,14,11))
But if I try to subset df1
using df1[match(df2$id, df1$id), ]
I get not what I expect, firstly because match
returns a NA
since it can't find id 0
in df1
which adds a row containing just NA
s and secondly because it returns just the first occurrence of id 3
in df1
but I want to have all occurrences of a match.
How to adjust the match function so that it does as described above?
You can use a combination of %in%
and which
. The output is not in the order you posted, though.
df1 <- data.frame(id=c(1:3,3:4), val= 11:15)
df2 <- data.frame(id= c(2,0,3,1), val=91:94)
i <- which(df1$id %in% df2$id)
df1[i, ]
#> id val
#> 1 1 11
#> 2 2 12
#> 3 3 13
#> 4 3 14
Created on 2023-04-22 with reprex v2.0.2
The following will give the posted order.
df1 <- data.frame(id=c(1:3,3:4), val= 11:15)
df2 <- data.frame(id= c(2,0,3,1), val=91:94)
i <- df1$id[df1$id %in% df2$id]
j <- df2$id[df2$id %in% df1$id]
df1[order(match(i, j)), ]
#> id val
#> 2 2 12
#> 3 3 13
#> 4 3 14
#> 1 1 11
Created on 2023-04-22 with reprex v2.0.2
After reading Ingo Pingo's and TarJae's comments I have changed the above solutions to
i <- which(df1$id %in% df2$id)
df1[order(match(df1$id[i], df2$id)),]
#> id val
#> 2 2 12
#> 3 3 13
#> 4 3 14
#> 1 1 11
Created on 2023-04-22 with reprex v2.0.2
j <- df1$id[df1$id %in% df2$id]
df1[order(match(j, df2$id)), ]
#> id val
#> 2 2 12
#> 3 3 13
#> 4 3 14
#> 1 1 11
Created on 2023-04-22 with reprex v2.0.2