match all occurrences in data frame

I'm trying to do something similar as in this post here: Extract rows for the first occurrence of a variable in a data frame but extract all occurrences, not just the first.

Here is a simplified example: I have this data frame called toDrop

Gene   Taxa
123    A
327    B
445    D
557    A
789    E
123    B
557    C

Here's my code that uses match and thus returns the first match only. I'm running this inside a loop so modifying things here for simplicity.

Gene <- c("123", "327", "445", "557", "789", "123", "557")
Taxa <- c("A", "B", "D", "A", "E", "B", "C")
toDrop <- data.frame(Gene, Taxa)
Temp <- list()
geneNameTemp <- "123"
toDrop[match(geneNameTemp, toDrop$Gene), 2] -> Temp

In this example, Temp should return a list of "A" and "B" I think I need to use lapply as in this post but can't figure it out from that example. Thanks for the help.

Solution

There are several ways to do this. One way in base R that is close to what you've already got is which() combined with %in%

Gene <- c("123", "327", "445", "557", "789", "123", "557")
Taxa <- c("A", "B", "D", "A", "E", "B", "C")
toDrop <- data.frame(Gene, Taxa)
Temp <- list()
geneNameTemp <- "123"
Temp <- as.list(toDrop[which(toDrop$Gene %in% geneNameTemp),2])
Temp
# [[1]]
# [1] A
# Levels: A B C D E
# 
# [[2]]
# [1] B
# Levels: A B C D E

Will return a list with the two factors. This method can be expanded to vector geneNameTemp, but it will include duplicates if there are any

Gene <- c("123", "327", "445", "557", "789", "123", "557")
Taxa <- c("A", "B", "D", "A", "E", "B", "C")
toDrop <- data.frame(Gene, Taxa)
Temp <- list()
geneNameTemp <- c("123", "327")
Temp <- as.list(toDrop[which(toDrop$Gene %in% geneNameTemp),2])
Temp
# [[1]]
# [1] A
# Levels: A B C D E
# 
# [[2]]
# [1] B
# Levels: A B C D E
# 
# [[3]]
# [1] B
# Levels: A B C D E

If you only need a vector with the factors you can remove as.list(). If you want to remove the duplicates you can use unique(toDrop[which(toDrop$Gene %in% geneNameTemp),2]).