Search code examples
rdataframedplyrgrepl

Replace specific part of a column when another column value above a threshold in R


I have a dataframe such as;

COL1                           COL2
Canis_lupus1                   10
Cattus_cattus2                 10
Betta_splendes3                30
Rattus_domesticus_norvegicus3  20
Canis_lupus_OK                 90
Betta_splendes32               54
Canis_lupus_lupus              18

And I would like to replace in COL1 each Canis_lupus content by : homo_sapiens when COL2 is < 20

Then I should get:

COL1                           COL2
Homos_sapiens1                   10
Cattus_cattus2                 10
Betta_splendes3                30
Rattus_domesticus_norvegicus3  20
Canis_lupus_OK                 90
Betta_splendes32               54
Homo_sapiens_lupus              18

Here is the df:

structure(list(COL1 = structure(c(5L, 6L, 1L, 7L, 4L, 2L, 3L), .Label = c("Betta_splendes3", 
"Betta_splendes32", "Canis_lupus_lupus", "Canis_lupus_OK", "Canis_lupus1", 
"Cattus_cattus2", "Rattus_domesticus_norvegicus3"), class = "factor"), 
    COL2 = c(10L, 10L, 30L, 20L, 90L, 54L, 18L)), class = "data.frame", row.names = c(NA, 
-7L))

Solution

  • You can use the following solution:

    library(dplyr)
    library(stringr)
    library(purrr)
    
    df %>% 
      mutate(COL1 = map2(COL1, COL2, ~ ifelse(str_detect(.x, "Canis_lupus") & .y < 20,
                                       str_replace(.x, "Canis_lupus", "homo_sapiens"), .x)))
    
    
                               COL1 COL2
    1                 homo_sapiens1   10
    2                Cattus_cattus2   10
    3               Betta_splendes3   30
    4 Rattus_domesticus_norvegicus3   20
    5                Canis_lupus_OK   90
    6              Betta_splendes32   54
    7            homo_sapiens_lupus   18