I have a list of names and assigned thresholds for those names to determine if the name I appropriate assigned.
You can recreate a test dataset using this:
df <- data.frame(level1 = c("Eukaryota","Eukaryota","Eukaryota","Eukaryota","Eukaryota"),
level2=c("Opisthokonta","Alveolata","Opisthokonta","Alveolata","Alveolata"),
level3=c("Fungi","Ciliophora","Fungi","Ciliophora","Dinoflagellata"),
level4=c("Basidiomycota","Spirotrichea","Basidiomycota","Spirotrichea","Dinophyceae"),
value = c("100;5;4;2", "100;100;100;100", "100;80;60;50", "90;50;40;40","100;80;20;0"))
I'd like to use tidy verse mutate()
and case_when()
to find a taxonomic level that passes a suitable threshold. So the below tidy verse statement breaks up the threshold values and then attempts to do this.
My bottle necks
case_when()
versus an ifelse()
statement - it may be more appropriate to use ifelse()??level1:level3
syntax would be painful!df_updated <- df %>%
separate(value, c("threshold1","threshold2", "threshold3", "threshold4"), sep =";") %>%
mutate(Name_updated = case_when(
threshold4 >= 50 ~ unite(level1:level4, sep = ";"), #Fill with all taxonomic names to level4
threshold4 < 50 & threshold3 >= 60 ~ unite(level1:level3, sep = ";"), #If last threshold is <50, only fill with taxonomic names to level3
threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~ unite(level1:level2, sep = ";"), #If thresholds for level 3 and 4 are below, fill only level1;level2
TRUE ~ level1)) %>% #Otherwise fill with only level 1
data.frame
Desired output
> df_updated$Name_updated
# Output of this new list:
Eukaryota
Eukaryota;Alveolata;Ciliophora;Spirotrichea
Eukaryota;Opisthokonta;Fungi;Basidiomycota
Eukaryota;Alveolata
Eukaryota;Alveolata
A desired next step is to write a function that allows the user to specify the threshold values that are used in the script. So I really need to make the probing/determining what threshold passes robust.
The issue is with unite
and also the type
of the separate
ed column. By default, convert = FALSE
and it would be a character
class column
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
df %>%
type.convert(as.is = TRUE) %>%
separate(value, c("threshold1","threshold2",
"threshold3", "threshold4"), sep =";", convert = TRUE) %>%
mutate(Name_updated =
case_when(
threshold4 >= 50 ~
select(., starts_with('level')) %>%
reduce(str_c, sep=";"),
threshold4 < 50 & threshold3 >= 60 ~
select(., level1:level3) %>%
reduce(str_c, sep=";"),
threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~
select(., level1:level2) %>%
reduce(str_c, sep=";"),
TRUE ~ level1))
# level1 level2 level3 level4 threshold1 threshold2 threshold3 threshold4
#1 Eukaryota Opisthokonta Fungi Basidiomycota 100 5 4 2
#2 Eukaryota Alveolata Ciliophora Spirotrichea 100 100 100 100
#3 Eukaryota Opisthokonta Fungi Basidiomycota 100 80 60 50
#4 Eukaryota Alveolata Ciliophora Spirotrichea 90 50 40 40
#5 Eukaryota Alveolata Dinoflagellata Dinophyceae 100 80 20 0
# Name_updated
#1 Eukaryota
#2 Eukaryota;Alveolata;Ciliophora;Spirotrichea
#3 Eukaryota;Opisthokonta;Fungi;Basidiomycota
#4 Eukaryota;Alveolata
#5 Eukaryota;Alveolata