I'm trying to do something similar as in this post here: Extract rows for the first occurrence of a variable in a data frame but extract all occurrences, not just the first.
Here is a simplified example: I have this data frame called toDrop
Gene Taxa
123 A
327 B
445 D
557 A
789 E
123 B
557 C
Here's my code that uses match and thus returns the first match only. I'm running this inside a loop so modifying things here for simplicity.
Gene <- c("123", "327", "445", "557", "789", "123", "557")
Taxa <- c("A", "B", "D", "A", "E", "B", "C")
toDrop <- data.frame(Gene, Taxa)
Temp <- list()
geneNameTemp <- "123"
toDrop[match(geneNameTemp, toDrop$Gene), 2] -> Temp
In this example, Temp should return a list of "A" and "B" I think I need to use lapply as in this post but can't figure it out from that example. Thanks for the help.
There are several ways to do this. One way in base R that is close to what you've already got is which()
combined with %in%
Gene <- c("123", "327", "445", "557", "789", "123", "557")
Taxa <- c("A", "B", "D", "A", "E", "B", "C")
toDrop <- data.frame(Gene, Taxa)
Temp <- list()
geneNameTemp <- "123"
Temp <- as.list(toDrop[which(toDrop$Gene %in% geneNameTemp),2])
Temp
# [[1]]
# [1] A
# Levels: A B C D E
#
# [[2]]
# [1] B
# Levels: A B C D E
Will return a list with the two factors. This method can be expanded to vector geneNameTemp, but it will include duplicates if there are any
Gene <- c("123", "327", "445", "557", "789", "123", "557")
Taxa <- c("A", "B", "D", "A", "E", "B", "C")
toDrop <- data.frame(Gene, Taxa)
Temp <- list()
geneNameTemp <- c("123", "327")
Temp <- as.list(toDrop[which(toDrop$Gene %in% geneNameTemp),2])
Temp
# [[1]]
# [1] A
# Levels: A B C D E
#
# [[2]]
# [1] B
# Levels: A B C D E
#
# [[3]]
# [1] B
# Levels: A B C D E
If you only need a vector with the factors you can remove as.list()
. If you want to remove the duplicates you can use unique(toDrop[which(toDrop$Gene %in% geneNameTemp),2])
.