Search code examples
ricd

How to use ICD10 Code in a regression model in R?


I am trying to find the ICD10 codes which are causing certain disease. But ICD10 has alpha numeric classification e.g. A00.00 . There are 1000s of such classifications but I am not sure how to use them in my regression model. Any suggestion please.

Data Patient Existing ICD10 Diabetic (Y) P1 A00.10 1 P2 A00.20 0 P1 C00.1 1 P3 Z01 1 ....


Solution

  • An effective way to do this is to use the concept of comorbidities. My R package icd does this for standardized sets of diseases, e.g. "Diabetes", "Cancer", "Heart Disease." There is a choice of the comorbidity maps, so you can pick one which aligns with your interests, e.g. PCCC maps in icd can be used for pediatrics, the others are for adults and span a variety of disease states.

    E.g., as described in the introduction vignette. These are actually ICD-9 codes, but you can use ICD-10.

    patients <- data.frame(
       visit_id = c(1000, 1000, 1000, 1000, 1001, 1001, 1002),
       icd9 = c("40201", "2258", "7208", "25001", "34400", "4011", "4011"),
       poa = c("Y", NA, "N", "Y", "X", "Y", "E"),
       stringsAsFactors = FALSE
       )
    patients
    
      visit_id  icd9  poa
    1     1000 40201    Y
    2     1000  2258 <NA>
    3     1000  7208    N
    4     1000 25001    Y
    5     1001 34400    X
    6     1001  4011    Y
    7     1002  4011    E
    
    icd::comorbid_ahrq(patients)
    
    CHF Valvular  PHTN   PVD  HTN Paralysis NeuroOther Pulmonary    DM  DMcx Hypothyroid Renal Liver
    1000  TRUE    FALSE FALSE FALSE TRUE     FALSE      FALSE     FALSE  TRUE FALSE       FALSE FALSE FALSE
    1001 FALSE    FALSE FALSE FALSE TRUE      TRUE      FALSE     FALSE FALSE FALSE       FALSE FALSE FALSE
    1002 FALSE    FALSE FALSE FALSE TRUE     FALSE      FALSE     FALSE FALSE FALSE       FALSE FALSE FALSE
           PUD   HIV Lymphoma  Mets Tumor Rheumatic Coagulopathy Obesity WeightLoss FluidsLytes BloodLoss
    1000 FALSE FALSE    FALSE FALSE FALSE      TRUE        FALSE   FALSE      FALSE       FALSE     FALSE
    1001 FALSE FALSE    FALSE FALSE FALSE     FALSE        FALSE   FALSE      FALSE       FALSE     FALSE
    1002 FALSE FALSE    FALSE FALSE FALSE     FALSE        FALSE   FALSE      FALSE       FALSE     FALSE
         Anemia Alcohol Drugs Psychoses Depression
    1000  FALSE   FALSE FALSE     FALSE      FALSE
    1001  FALSE   FALSE FALSE     FALSE      FALSE
    1002  FALSE   FALSE FALSE     FALSE      FALSE
    

    With "DM" being Diabetes Mellitus, and "DMcx" for being diabetes with complications, e.g., retinopathy or renal failure. This is with the US AHRQ modification of the standard Elixhauser categories.

    When you have binary flags for the disease states, you can use these in any statistical or machine learning model.