Search code examples
ricd

Is there an R function to convert a dataframe of ICD10 codes to their respective subchapters


I have a dataframe of ICD10 codes that I need to convert to their respective subchapters. The subchapters of these codes are identified using the first 3 characters of each code i.e. the subchapter for M1711 is M17.

Is there an efficient way to map from these codes to their subchapters?

Here's an example dataset of codes that I'm using:

df <- data.frame(codes = c("Z23","M1711","E0500","Z00129","G4452"))

I understand that Jack O. Wasey has a great package ICD that can convert to comorbidities and also has the subchapter dataset:

install.packages("devtools")
devtools::install_github("jackwasey/icd")

sub_chap <- icd::icd10_sub_chapters

But as you can see below, the data is in a range of values and is not in the right format for 'joining' to.

When I unlist the subchapters I am missing values in between the values in the original dataframe

sub_chap_df = as.data.frame(unlist(sub_chap))

Is there an efficient way that I can convert my ICD10 codes to their respective subchapter?

sub chapters in list format

sub chapter in df format with in-between values missing


Solution

  • You can use tidyr::complete() and tidyr::full_seq() to fill in the full range of codes. You’ll also need to separate the letter and numeric parts of the code to use full_seq(), then join them back together.

    Note I don’t have the icd package installed, so I made some quick stand-in data.

    library(tidyverse)
    
    # example data
    sub_chap <- list(
      cat1 = c(start = "A01", end = "A09"),
      cat2 = c(start = "A15", end = "A19")
    )
    
    subchap_lookup <- tibble(
        subchapter = names(sub_chap),
        codes = sub_chap
      ) %>%
      unnest_longer(codes, indices_include = FALSE) %>% 
      separate(codes, into = c("letter", "number"), sep = 1, convert = TRUE) %>%
      group_by(subchapter, letter) %>%
      complete(number = full_seq(number, 1)) %>%
      ungroup() %>%
      mutate(
        codes = str_c(letter, str_pad(number, 2, pad = "0")),
        .keep = "unused"
      )
    

    Output:

    # A tibble: 14 × 2
       subchapter codes
       <chr>      <chr>
     1 cat1       A01  
     2 cat1       A02  
     3 cat1       A03  
     4 cat1       A04  
     5 cat1       A05  
     6 cat1       A06  
     7 cat1       A07  
     8 cat1       A08  
     9 cat1       A09  
    10 cat2       A15  
    11 cat2       A16  
    12 cat2       A17  
    13 cat2       A18  
    14 cat2       A19  
    

    You can then proceed with a simple left join:

    left_join(df, subchap_lookup)