I am trying to filter and combine some individuals from a large list and a dataset containing 100 thousand rows. They are contained in one column named "IndividualsObserved" and appear as a string of characters like this:
IndividualsObserved
c("Azur";"Bleue";"Noir";"Azur","Bleue","Ivoire","Fitz","Gloria","Tyler")*
*There are many more individuals that could be present in different combinations, but I am only interested in the ones detailed below.
I would like to know how I can use dplyr to create another column (with mutate) that ONLY contains certain individuals like, for example, the following: "Azur", "Bleue", "Ivoire", "Fitz", "Gloria", and "Tyler". The rest can be ignored. I would like this column to be named "IndividualsFiltered"
These individuals would be the following:
IndividualsFiltered
"Azur", "Bleue", "Ivoire", "Fitz", "Gloria", "Tyler"
(Those that are not selected are very abundant; could it be that they are not specified but deselected automatically if they are not one of the listed above?)
Secondly, I would like to create two types of additional columns that combine two individuals (dyads) in relation to two criteria attending to their group membership.
If the first three belong to GroupA (Azur, Bleue, Ivoire), And the last three belong to GroupB(Fitz, Gloria, Tyler).
Considering that I would like to impose the condition of the inclusion of the individuals mentioned above as IndividualsFiltered, that would be for: GroupA=(Azur, Bleue, Ivoire),
and for
GroupB=(Fitz, Gloria, Tyler).
Taking from the example of the last entry (individuals that would appear in newly created
IndividualsFiltered being c("Azur";"Bleue";"Ivoire","Fitz","Gloria","Tyler")
The two criteria would be the following:
Individuals of different groups appear in separate new columns in groups of two individuals that belong to the same group. These columns would contain two individuals that could be named Dyad1DifferentGroup, Dyad2SameGroup, etc (depending on how many possible combinations of two individuals there are). There would be a total of 6 dyads, each of which could appear in the following columns. With a precision in the Dyad number of the Group (A or B).
For GroupA the dyads would be: Dyad1GroupA ="Azur";"Bleue" Dyad2GroupA ="Azur";"Ivoire" Dyad3GroupA ="Bleue";"Ivoire"
For GroupB the dyads would be: Dyad1GroupB ="Fitz";"Gloria" Dyad2GroupB ="Fitz";"Tyler" Dyad3GroupB ="Gloria";"Tyler"
Individuals of different groups appear in separate new columns in groups of two individuals. The same logic applies to the new columns that, in this case, could be named Dyad1GroupAB, Dyad2GroupAB, etc.
So the resulting dyads would be:
Dyad1GroupAB for "Azur";"Fitz"
Dyad2GroupAB for "Azur";"Gloria"
Dyad3GroupAB for "Azur";"Tyler"
Dyad4GroupAB for "Bleue";"Fitz"
Dyad5GroupAB for "Bleue";"Gloria"
Dyad6GroupAB for "Bleue";"Fitz"
Dyad7GroupAB for "Ivoire";"Fitz"
Dyad8GroupAB for "Ivoire";"Gloria"
Dyad9GroupAB for "Ivoire";"Tyler"
Thanks a lot if you have some ideas about possible approaches, sorry if I did not vote for previous comments I received, but I am not allowed yet (not been registered long enough).
df <- tibble(individuals = list(c("Azur","Bleue","Noir","Azur","Bleue","Ivoire","Fitz","Gloria","Tyler")))
certain_individuals <- c("Azur", "Bleue", "Ivoire", "Fitz", "Gloria", "Tyler")
dplyr::mutate(df, individuals = purrr::map(individuals, ~ .x[.x %in% certain_individuals]))
Output:
# A tibble: 1 × 1
individuals
<list>
1 <chr [8]>
I would make the second part a separate question.