So I have this data frame with three columns: species
(the names of the species in that row); syonym
(a synonym species name that occurs in at least one instance of the species
column in the data frame; and group
, which is basically groups of occurrences (can have one or more species names). This is a reproducible example of that df:
df <- data.frame(
species = c("Species X","Species A", "Species Z", "Species A", "Species B", "Species C", "Species C", "Species D", "Species D", "Species A", "Species B", "Species E","Species Y","Species W","Species R"),
synonyms = c("Species Y","Species B", "no_synonym", "Species B", "Species A", "Species E", "Species E", "no_synonym", "no_synonym", "Species B", "Species A", "Species C","Species X","Species R","Species W"),
groups = c("G1","G1", "G1", "G1", "G1", "G2", "G2", "G3", "G3", "G1", "G4", "G5","G6","G7","G8")
)
I am trying to create a new column with "yes" or "no" that checks whether a specific group has two synonym species names. For instance, group "G1" has Species A and Species B, so it would be "yes", because "G1" has both Species A and Species B as instances in the species
column, and they are synonyms of each other. Species Z, although it is in "G1", has no synonyms so it should be "no". Species X has a synonym in the df (the same for Species Y, they have each other), but it doesn't have any row with its synonym in the species
column in its group, "G1".
"G2" would be "no" because even though Species E is a synonym of Species C, "G2" doesn't have instances of both Species E and Species C in the species
column.
The species that have "no_synonym" would also be "no".
Basically this would be the output:
df <- data.frame(
species = c("Species X","Species A", "Species Z", "Species A", "Species B", "Species C", "Species C", "Species D", "Species D", "Species A", "Species B", "Species E","Species Y","Species W","Species R"),
synonyms = c("Species Y","Species B", "no_synonym", "Species B", "Species A", "Species E", "Species E", "no_synonym", "no_synonym", "Species B", "Species A", "Species C","Species X","Species R","Species W"),
groups = c("G1","G1", "G1", "G1", "G1", "G2", "G2", "G3", "G3", "G1", "G4", "G5","G6","G7","G8"),
at_least_two_synonyms_in_group=c("no","yes","no","yes","yes","no","no","no","no","yes","no","no","no","no","no"))
I tried using dplyr, but I'm not getting the output I'm expecting, for instance, the first and third rows have "yes", but they should be "no", because even if Species Y is a synonym of Species X and occurs somewhere in the df, it doesn't occur in "G1". Similarly, Species Z doesn't even have a synonym in the df, so it should be "no" as well.
df <- df %>%
group_by(groups) %>%
mutate(
at_least_two_synonyms_in_group = ifelse(
any(synonyms %in% species) & any(species %in% synonyms) & n_distinct(intersect(synonyms, species)) >= 2,
"yes",
"no"
)
) %>%
ungroup()
Can you reduce the logic to just check if synonyms
is in species
?
df %>%
group_by(groups) %>%
mutate(
ingroup = if_else(synonyms %in% species, "yes", "no")
) %>%
ungroup()
# species synonyms groups at_least_two_synonyms_in_group ingroup
# 1 Species X Species Y G1 no no
# 2 Species A Species B G1 yes yes
# 3 Species Z no_synonym G1 no no
# 4 Species A Species B G1 yes yes
# 5 Species B Species A G1 yes yes
# 6 Species C Species E G2 no no
# 7 Species C Species E G2 no no
# 8 Species D no_synonym G3 no no
# 9 Species D no_synonym G3 no no
# 10 Species A Species B G1 yes yes
# 11 Species B Species A G4 no no
# 12 Species E Species C G5 no no
# 13 Species Y Species X G6 no no