Search code examples
rdataframeconditional-statementsmultiple-columnsna

How to turn specific values of a column into NAs based on the condition of a letter existing in another column?


I have this example df

df = data.frame(id = c('1E','1F','2E','2F','3F','4F'),
                statut = c('A','A','B','B','A','A') )

For each id that has the number repeated, I wish that the statut becomes NA when the letter of the repeated id is E

I am looking to have the following output:

  id statut
1 1E   <NA>
2 1F      A
3 2E   <NA>
4 2F      B
5 3F      A
6 4F      A

Solution

  • Here's a set of conditions that match your expected output:

    idx <- substr(df$id, 1, 1) == dplyr::lead(substr(df$id, 1, 1)) & substr(df$id, 2, 2) == "E"
    df$statut[idx] <- NA
    
      id statut
    1 1E   <NA>
    2 1F      A
    3 2E   <NA>
    4 2F      B
    5 3F      A
    6 4F      A
    

    Or in a more dplyr friendly pipeline:

    library(tidyr)
    library(dplyr)
    df %>%
      separate_wider_regex(id, c(number = "\\d+", letter = ".*")) %>% 
      group_by(number) %>% 
      mutate(statut = ifelse(n() > 1 & letter == "E", NA, statut))