subset-friendly version of match()

I dislike (or don't understand) the match() function because it outputs NA if it doesn't find a value and because it returns only the first occurrence of a match. Both is bad for subsetting one dataframe by another, at least in my situation.

Let the data be

df1 <- data.frame(id=c(1:3,3:4), val= 11:15)
df2 <- data.frame(id= c(2,0,3,1), val=91:94)

I want to subset df1 by the id's of df2. So my desired output is

df_expected <- data.frame(id= c(2,3,3,1), val= c(12,13,14,11))

But if I try to subset df1 using df1[match(df2$id, df1$id), ] I get not what I expect, firstly because match returns a NA since it can't find id 0 in df1 which adds a row containing just NAs and secondly because it returns just the first occurrence of id 3 in df1 but I want to have all occurrences of a match.

How to adjust the match function so that it does as described above?

Solution

You can use a combination of %in% and which. The output is not in the order you posted, though.

df1 <- data.frame(id=c(1:3,3:4), val= 11:15)
df2 <- data.frame(id= c(2,0,3,1), val=91:94)

i <- which(df1$id %in% df2$id)
df1[i, ]
#>   id val
#> 1  1  11
#> 2  2  12
#> 3  3  13
#> 4  3  14

^{Created on 2023-04-22 with reprex v2.0.2}

Edit

The following will give the posted order.

df1 <- data.frame(id=c(1:3,3:4), val= 11:15)
df2 <- data.frame(id= c(2,0,3,1), val=91:94)

i <- df1$id[df1$id %in% df2$id]
j <- df2$id[df2$id %in% df1$id]
df1[order(match(i, j)), ]
#>   id val
#> 2  2  12
#> 3  3  13
#> 4  3  14
#> 1  1  11

^{Created on 2023-04-22 with reprex v2.0.2}

Edit 2

After reading Ingo Pingo's and TarJae's comments I have changed the above solutions to

1

i <- which(df1$id %in% df2$id)
df1[order(match(df1$id[i], df2$id)),]
#>   id val
#> 2  2  12
#> 3  3  13
#> 4  3  14
#> 1  1  11

^{Created on 2023-04-22 with reprex v2.0.2}

2

j <- df1$id[df1$id %in% df2$id]
df1[order(match(j, df2$id)), ]
#>   id val
#> 2  2  12
#> 3  3  13
#> 4  3  14
#> 1  1  11

^{Created on 2023-04-22 with reprex v2.0.2}