Search code examples
rextract

Extract a value following conditions


I have this following example dataset with two variables (Ip and Ge) that could contains one, two (or more) values separated by ";". Each value in Ip is associated to the value of Ge following their order (ex: line3: Abri is associated with Ibr, and Echo is associated with Tyh)

Data_test= data.frame(Ip=c("Abri", "Abri", "Abri;Echo", "Echo", "Abri;Abri", "Echo;Abri", "Abri", "Abri;Abri;Echo", "Abri;Abri"), 
           Ge=c("Aju", "Jlt", "Ibr;Tyh", "Rtj", "Ekm;Bty", "Itz;Orv", "Adr", "Dhu;Vfr;Lnh", "Tde;Ijp"), 
           Ca=c(2,13,4,2, 4, 6, 13, 13, 4), Mp=c(5,5,4,3,10,1,17, 15, 2))

Data_test

              Ip          Ge Ca Mp
1           Abri         Aju  2  5
2           Abri         Jlt 13  5
3      Abri;Echo     Ibr;Tyh  4  4
4           Echo         Rtj  2  3
5      Abri;Abri     Ekm;Bty  4 10
6      Echo;Abri     Itz;Orv  6  1
7           Abri         Adr 13 17
8 Abri;Abri;Echo Dhu;Vfr;Lnh 13 15
9      Abri;Abri     Tde;Ijp  4  2


I would like to extract in another variable, the values in Ge depending on the value in Ip, Ca and Mp following these rules:

-> (1) Extract all values associated with Echo
-> (2) Extract all values associated with Abri and (with Ca>9 and/or Mp>8)

The output would be

> Output_test
              Ip          Ge Ca Mp     Outpout
1           Abri         Aju  2  5        <NA>
2           Abri         Jlt 13  5         Jlt
3      Abri;Echo     Ibr;Tyh  4  4         Tyh
4           Echo         Rtj  2  3         Rtj
5      Abri;Abri     Ekm;Bty  4 10     Ekm;Bty
6      Echo;Abri     Itz;Orv  6  1         Itz
7           Abri         Adr 13 17         Adr
8 Abri;Abri;Echo Dhu;Vfr;Lnh 13 15 Dhu;Vfr;Lnh
9      Abri;Abri     Tde;Ijp  4  2        <NA>

I tried to do that using mutate and ifelse, but I dont know how to deal with the case when Ip and Ge contain more than one values. Any help will be appreciated

library(tidyverse)

Data_test %>% 
  mutate(Output=ifelse(Ip=="Echo", Ge, 
                       ifelse(Ip=="Abri" & (Ca>9 | Mp>8), Ge, 
                       ifelse( , , ))))

Solution

  • Using pmap():

    Steps:

    1. Split Ip and Ge on colons, turning them into the list columns i and g
    2. use pmap() to loop through each row, getting the 'Echo' values, then the 'Abri' values, and collapsing them back into a character vector. For some reason, I get ""s instead of NAs for ones which don't match, so I need to turn ""s into NAs using an ifelse()
    3. Drop i and g
    library(tidyverse)
    
    Data_test |> 
      mutate(i = strsplit(Ip, ";"),
             g = strsplit(Ge, ";"),
             Output = pmap_chr(list(i, g, Ca, Mp), \(i, g, a, m) g[i == "Echo" | (i == "Abri" & (a > 9 | m > 8))] %>% paste(collapse = ";") %>% {ifelse(. == "", NA, .)})) |>
      select(-i, -g)
    

    Output:

    # A tibble: 9 × 5
      Ip             Ge             Ca    Mp Output     
      <chr>          <chr>       <dbl> <dbl> <chr>      
    1 Abri           Aju             2     5 NA         
    2 Abri           Jlt            13     5 Jlt        
    3 Abri;Echo      Ibr;Tyh         4     4 Tyh        
    4 Echo           Rtj             2     3 Rtj        
    5 Abri;Abri      Ekm;Bty         4    10 Ekm;Bty    
    6 Echo;Abri      Itz;Orv         6     1 Itz        
    7 Abri           Adr            13    17 Adr        
    8 Abri;Abri;Echo Dhu;Vfr;Lnh    13    15 Dhu;Vfr;Lnh
    9 Abri;Abri      Tde;Ijp         4     2 NA