Search code examples
rtidyrtidy

Fill column with unite() using mutate and case_when() statement in R, tidy verse


I have a list of names and assigned thresholds for those names to determine if the name I appropriate assigned.

You can recreate a test dataset using this:

df <- data.frame(level1 = c("Eukaryota","Eukaryota","Eukaryota","Eukaryota","Eukaryota"), 
             level2=c("Opisthokonta","Alveolata","Opisthokonta","Alveolata","Alveolata"), 
             level3=c("Fungi","Ciliophora","Fungi","Ciliophora","Dinoflagellata"),
             level4=c("Basidiomycota","Spirotrichea","Basidiomycota","Spirotrichea","Dinophyceae"), 
             value = c("100;5;4;2", "100;100;100;100", "100;80;60;50", "90;50;40;40","100;80;20;0"))

I'd like to use tidy verse mutate() and case_when() to find a taxonomic level that passes a suitable threshold. So the below tidy verse statement breaks up the threshold values and then attempts to do this. My bottle necks

  1. Using case_when() versus an ifelse() statement - it may be more appropriate to use ifelse()??
  2. I can't figure out how to fill the new column called Name_updated with a concatenated level1-levelX. Right now, unite() is not appropriate, as this has to do with whole datasets. In reality I have a lot more columns, so doing this without the tidy verse level1:level3 syntax would be painful!
df_updated <- df %>% 
  separate(value, c("threshold1","threshold2", "threshold3", "threshold4"), sep =";") %>% 
  mutate(Name_updated = case_when(
    threshold4 >= 50 ~ unite(level1:level4, sep = ";"), #Fill with all taxonomic names to level4
    threshold4 < 50 & threshold3 >= 60 ~ unite(level1:level3, sep = ";"), #If last threshold is <50, only fill with taxonomic names to level3
    threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~ unite(level1:level2, sep = ";"), #If thresholds for level 3 and 4 are below, fill only level1;level2
    TRUE ~ level1)) %>% #Otherwise fill with only level 1
  data.frame

Desired output

> df_updated$Name_updated
# Output of this new list:
Eukaryota
Eukaryota;Alveolata;Ciliophora;Spirotrichea
Eukaryota;Opisthokonta;Fungi;Basidiomycota
Eukaryota;Alveolata
Eukaryota;Alveolata

A desired next step is to write a function that allows the user to specify the threshold values that are used in the script. So I really need to make the probing/determining what threshold passes robust.


Solution

  • The issue is with unite and also the type of the separateed column. By default, convert = FALSE and it would be a character class column

    library(dplyr)
    library(tidyr)
    library(purrr)
    library(stringr)
    df %>% 
      type.convert(as.is = TRUE) %>%
      separate(value, c("threshold1","threshold2", 
              "threshold3", "threshold4"), sep =";", convert = TRUE) %>% 
      mutate(Name_updated = 
         case_when(
          threshold4 >= 50 ~
             select(., starts_with('level')) %>% 
                reduce(str_c, sep=";"),
           threshold4 < 50 & threshold3 >= 60 ~ 
              select(., level1:level3) %>%
                reduce(str_c, sep=";"), 
           threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~ 
              select(., level1:level2) %>% 
                reduce(str_c, sep=";"), 
          TRUE ~ level1))
    #  level1       level2         level3        level4 threshold1 threshold2 threshold3 threshold4
    #1 Eukaryota Opisthokonta          Fungi Basidiomycota        100          5          4          2
    #2 Eukaryota    Alveolata     Ciliophora  Spirotrichea        100        100        100        100
    #3 Eukaryota Opisthokonta          Fungi Basidiomycota        100         80         60         50
    #4 Eukaryota    Alveolata     Ciliophora  Spirotrichea         90         50         40         40
    #5 Eukaryota    Alveolata Dinoflagellata   Dinophyceae        100         80         20          0
    #                                 Name_updated
    #1                                   Eukaryota
    #2 Eukaryota;Alveolata;Ciliophora;Spirotrichea
    #3  Eukaryota;Opisthokonta;Fungi;Basidiomycota
    #4                         Eukaryota;Alveolata
    #5                         Eukaryota;Alveolata