I am trying to find the ICD10 codes which are causing certain disease. But ICD10 has alpha numeric classification e.g. A00.00 . There are 1000s of such classifications but I am not sure how to use them in my regression model. Any suggestion please.
Data Patient Existing ICD10 Diabetic (Y) P1 A00.10 1 P2 A00.20 0 P1 C00.1 1 P3 Z01 1 ....
An effective way to do this is to use the concept of comorbidities. My R package icd does this for standardized sets of diseases, e.g. "Diabetes", "Cancer", "Heart Disease." There is a choice of the comorbidity maps, so you can pick one which aligns with your interests, e.g. PCCC maps in icd can be used for pediatrics, the others are for adults and span a variety of disease states.
E.g., as described in the introduction vignette. These are actually ICD-9 codes, but you can use ICD-10.
patients <- data.frame(
visit_id = c(1000, 1000, 1000, 1000, 1001, 1001, 1002),
icd9 = c("40201", "2258", "7208", "25001", "34400", "4011", "4011"),
poa = c("Y", NA, "N", "Y", "X", "Y", "E"),
stringsAsFactors = FALSE
)
patients
visit_id icd9 poa
1 1000 40201 Y
2 1000 2258 <NA>
3 1000 7208 N
4 1000 25001 Y
5 1001 34400 X
6 1001 4011 Y
7 1002 4011 E
icd::comorbid_ahrq(patients)
CHF Valvular PHTN PVD HTN Paralysis NeuroOther Pulmonary DM DMcx Hypothyroid Renal Liver
1000 TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
1001 FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
1002 FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
PUD HIV Lymphoma Mets Tumor Rheumatic Coagulopathy Obesity WeightLoss FluidsLytes BloodLoss
1000 FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
1001 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
1002 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Anemia Alcohol Drugs Psychoses Depression
1000 FALSE FALSE FALSE FALSE FALSE
1001 FALSE FALSE FALSE FALSE FALSE
1002 FALSE FALSE FALSE FALSE FALSE
With "DM" being Diabetes Mellitus, and "DMcx" for being diabetes with complications, e.g., retinopathy or renal failure. This is with the US AHRQ modification of the standard Elixhauser categories.
When you have binary flags for the disease states, you can use these in any statistical or machine learning model.