Search code examples
rdatabasemedical

How to identify value change in one column in R?


I plan to identify and extract the subjects who experienced drug change from old drug to new drug. In the code below, there are two types of drugs: A and B, type A is an old drug, while type B is a new drug, and B has different drug brands: 2,3 and 4.
With the time passing by for each person, there are 3 patterns of drug change:

  1. Patient 11 changed drug from type A to type B, and only the change from A and B is ok, but change from B to A is not regarded as change.

  2. Patient 12 was always using type B, but he changed from brand 2 to brand 3.

  3. Patient 13 changed from type A to type B, but he again changed from brand 2 to brand 3.

    df <- data.frame(id = c(11,11,11,11,12,12,12,12,13,13,13,13),
                  drug_type = c("A","A","B","B","B","B","B","B","A","A","B","B"),
                  drug_brand = c(1,1,2,2,2,3,3,3,1,1,2,3),
                  date = c("2020-01-01","2020-02-01","2020-03-01","2020-03-13",
                           "2019-04-05","2019-05-02","2019-06-03","2019-08-04",
                           "2021-02-02","2021-02-27","2021-03-22","2021-04-11"))
     df$date <- as.Date(df$date)
    

So how should I filter the patients who changed drugs from this dataset?
To solve this, I summarized the last date of use of drug for type A and the first date of use of drug for type B in two data frames. And I inner join them with id and filter with the condition that first date of type B is later than last date of type A, but this may only solve the change from type A to type B. I don't know how to identify all the patterns of drug change.

I haven't found any solution or any similar question about this, so I sincerely hope you can share your ideas with me. Thank you for your time.


Solution

  • Perhaps you could look at the transition of drug_type from "A" to "B", or include where the number of distinct drug_brand is greater than 1?

    library(tidyverse)
    
    df %>%
      group_by(id) %>%
      filter(any(drug_type == "B" & lag(drug_type) == "A") |
               n_distinct(drug_brand) > 1)