Search code examples
rdplyr

Row wise conditional comparisons


I am looking at some antimicrobial resistance data but come across a problem. Very occasionally there are antibiotic sensitive/resistant mismatches across two separate antibiotics when there shouldn't be. In the reprex below, OXA (oxacillin) should always be congruent with FLC (flucloxacillin); we can see the second sample is congruent but not the first. This is almost certainly a data entry problem from when repeat testing was done and so the sample needs to be excluded.

I don't want NAs to be identified as mismatches either, as sometimes only one or other of OXA and FLC are tested.

I can conditionally check between these two different antibiotics if I pivot the data to a wide format and then back again but I am wondering if there is a way to do this conditional checking between rows in the long format, without pivoting backwards and forwards.

library(dplyr)
library(tidyr)

df <- tribble(
  ~id, ~organism, ~antibiotic, ~sensitivity,
  1, "STAUR", "FLC", "R",
  1, "STAUR", "OXA", "S",
  1, "STAUR", "VAN", "S",
  1, "STAUR", "CLI", "S",
  2, "STAUR", "FLC", "S",
  2, "STAUR", "OXA", "S",
  2, "STAUR", "VAN", "S",
  2, "STAUR", "CLI", "S",
  3, "STAUR", "FLC", "S",
  3, "STAUR", "OXA", NA,
  3, "STAUR", "VAN", "S",
  3, "STAUR", "CLI", "R",
  4, "STAUR", "FLC", NA,
  4, "STAUR", "OXA", "S",
  4, "STAUR", "VAN", "S",
  4, "STAUR", "CLI", "R"
)
  
df %>% 
  pivot_wider(
    id_cols = id:organism,
    names_from = antibiotic,
    values_from = sensitivity
  ) %>%
  mutate(mismatch = case_when(FLC != OXA ~ TRUE,
                              FLC == OXA | is.na(FLC) | is.na(OXA) ~ FALSE)) %>% 
  pivot_longer(cols = FLC:CLI,
               names_to = "antibiotic",
               values_to = "sensitivity")


#> # A tibble: 16 × 5
#>       id organism mismatch antibiotic sensitivity
#>    <dbl> <chr>    <lgl>    <chr>      <chr>      
#>  1     1 STAUR    TRUE     FLC        R          
#>  2     1 STAUR    TRUE     OXA        S          
#>  3     1 STAUR    TRUE     VAN        S          
#>  4     1 STAUR    TRUE     CLI        S          
#>  5     2 STAUR    FALSE    FLC        S          
#>  6     2 STAUR    FALSE    OXA        S          
#>  7     2 STAUR    FALSE    VAN        S          
#>  8     2 STAUR    FALSE    CLI        S          
#>  9     3 STAUR    FALSE    FLC        S          
#> 10     3 STAUR    FALSE    OXA        <NA>       
#> 11     3 STAUR    FALSE    VAN        S          
#> 12     3 STAUR    FALSE    CLI        R          
#> 13     4 STAUR    FALSE    FLC        <NA>       
#> 14     4 STAUR    FALSE    OXA        S          
#> 15     4 STAUR    FALSE    VAN        S          
#> 16     4 STAUR    FALSE    CLI        R

Created on 2024-03-10 with reprex v2.1.0


Solution

  • Within each id, you can subset to the relevant antibiotics and test whether there’s more than one unique sensitivity value. If you use dplyr::n_distinct() with na.rm = TRUE, NA values won't be identified as mismatches.

    library(dplyr)
    
    df %>%
      mutate(
        mismatch = n_distinct(
            sensitivity[antibiotic %in% c("OXA", "FLC")], 
            na.rm = TRUE
          ) > 1, 
        .by = id
      )
    #> # A tibble: 16 × 5
    #>       id organism antibiotic sensitivity mismatch
    #>    <dbl> <chr>    <chr>      <chr>       <lgl>   
    #>  1     1 STAUR    FLC        R           TRUE    
    #>  2     1 STAUR    OXA        S           TRUE    
    #>  3     1 STAUR    VAN        S           TRUE    
    #>  4     1 STAUR    CLI        S           TRUE    
    #>  5     2 STAUR    FLC        S           FALSE   
    #>  6     2 STAUR    OXA        S           FALSE   
    #>  7     2 STAUR    VAN        S           FALSE   
    #>  8     2 STAUR    CLI        S           FALSE   
    #>  9     3 STAUR    FLC        S           FALSE   
    #> 10     3 STAUR    OXA        <NA>        FALSE   
    #> 11     3 STAUR    VAN        S           FALSE   
    #> 12     3 STAUR    CLI        R           FALSE   
    #> 13     4 STAUR    FLC        <NA>        FALSE   
    #> 14     4 STAUR    OXA        S           FALSE   
    #> 15     4 STAUR    VAN        S           FALSE   
    #> 16     4 STAUR    CLI        R           FALSE
    

    Created on 2024-03-10 with reprex v2.1.0