I am looking at some antimicrobial resistance data but come across a problem. Very occasionally there are antibiotic sensitive/resistant mismatches across two separate antibiotics when there shouldn't be. In the reprex below, OXA (oxacillin) should always be congruent with FLC (flucloxacillin); we can see the second sample is congruent but not the first. This is almost certainly a data entry problem from when repeat testing was done and so the sample needs to be excluded.
I don't want NAs to be identified as mismatches either, as sometimes only one or other of OXA and FLC are tested.
I can conditionally check between these two different antibiotics if I pivot the data to a wide format and then back again but I am wondering if there is a way to do this conditional checking between rows in the long format, without pivoting backwards and forwards.
library(dplyr)
library(tidyr)
df <- tribble(
~id, ~organism, ~antibiotic, ~sensitivity,
1, "STAUR", "FLC", "R",
1, "STAUR", "OXA", "S",
1, "STAUR", "VAN", "S",
1, "STAUR", "CLI", "S",
2, "STAUR", "FLC", "S",
2, "STAUR", "OXA", "S",
2, "STAUR", "VAN", "S",
2, "STAUR", "CLI", "S",
3, "STAUR", "FLC", "S",
3, "STAUR", "OXA", NA,
3, "STAUR", "VAN", "S",
3, "STAUR", "CLI", "R",
4, "STAUR", "FLC", NA,
4, "STAUR", "OXA", "S",
4, "STAUR", "VAN", "S",
4, "STAUR", "CLI", "R"
)
df %>%
pivot_wider(
id_cols = id:organism,
names_from = antibiotic,
values_from = sensitivity
) %>%
mutate(mismatch = case_when(FLC != OXA ~ TRUE,
FLC == OXA | is.na(FLC) | is.na(OXA) ~ FALSE)) %>%
pivot_longer(cols = FLC:CLI,
names_to = "antibiotic",
values_to = "sensitivity")
#> # A tibble: 16 × 5
#> id organism mismatch antibiotic sensitivity
#> <dbl> <chr> <lgl> <chr> <chr>
#> 1 1 STAUR TRUE FLC R
#> 2 1 STAUR TRUE OXA S
#> 3 1 STAUR TRUE VAN S
#> 4 1 STAUR TRUE CLI S
#> 5 2 STAUR FALSE FLC S
#> 6 2 STAUR FALSE OXA S
#> 7 2 STAUR FALSE VAN S
#> 8 2 STAUR FALSE CLI S
#> 9 3 STAUR FALSE FLC S
#> 10 3 STAUR FALSE OXA <NA>
#> 11 3 STAUR FALSE VAN S
#> 12 3 STAUR FALSE CLI R
#> 13 4 STAUR FALSE FLC <NA>
#> 14 4 STAUR FALSE OXA S
#> 15 4 STAUR FALSE VAN S
#> 16 4 STAUR FALSE CLI R
Created on 2024-03-10 with reprex v2.1.0
Within each id
, you can subset to the relevant antibiotics and test whether there’s more than one unique sensitivity
value. If you use dplyr::n_distinct()
with na.rm = TRUE
, NA
values won't be identified as mismatches.
library(dplyr)
df %>%
mutate(
mismatch = n_distinct(
sensitivity[antibiotic %in% c("OXA", "FLC")],
na.rm = TRUE
) > 1,
.by = id
)
#> # A tibble: 16 × 5
#> id organism antibiotic sensitivity mismatch
#> <dbl> <chr> <chr> <chr> <lgl>
#> 1 1 STAUR FLC R TRUE
#> 2 1 STAUR OXA S TRUE
#> 3 1 STAUR VAN S TRUE
#> 4 1 STAUR CLI S TRUE
#> 5 2 STAUR FLC S FALSE
#> 6 2 STAUR OXA S FALSE
#> 7 2 STAUR VAN S FALSE
#> 8 2 STAUR CLI S FALSE
#> 9 3 STAUR FLC S FALSE
#> 10 3 STAUR OXA <NA> FALSE
#> 11 3 STAUR VAN S FALSE
#> 12 3 STAUR CLI R FALSE
#> 13 4 STAUR FLC <NA> FALSE
#> 14 4 STAUR OXA S FALSE
#> 15 4 STAUR VAN S FALSE
#> 16 4 STAUR CLI R FALSE
Created on 2024-03-10 with reprex v2.1.0