Search code examples
rstringstr-replace

Why does str_replace_all is not replacing strings in R?


I do have a data frame where I need to edit the Diseases names. Each Disease has several rows related to it. For some reason, when I use str_replace_all, the replacement does not happen for two conditions ("Peripheral neuropathies (excluding cranial nerve and carpal tunnel syndromes)", "Venous thromboembolic disease (Excl PE)"). There is no warning or error message in the output, so I can't figure out what is the issue. Does anyone have any ideas?

codelists <- data.frame(Disease = sample(c("Peripheral neuropathies (excluding cranial nerve and carpal tunnel syndromes)", "Primary Malignancy_Brain, Other CNS and Intracranial", "Venous thromboembolic disease (Excl PE)"), 15, replace = T), Codes = 1:15)

## Sort the dataframe according to Disease
codelists <- codelists[order(codelists$Disease), ]

library(stringr)
codelists$Disease2 <- str_replace_all(codelists$Disease, c("Peripheral neuropathies (excluding cranial nerve and carpal tunnel syndromes)" = "Non-diabetic peripheral neuropathies (excluding cranial nerves and carpal tunnel syndrome)", "Primary Malignancy_Brain, Other CNS and Intracranial" = "Primary malignancy brain, other CNS and intracranial", "Venous thromboembolic disease (Excl PE)" = "Venous thromboembolism"))

Thanks.


Solution

  • In regex chaarcters like *, ( have special meaning. str_replace_all by default uses regex replacement. Since you want to match words like "(excluding cranial nerve and carpal tunnel syndromes)" exactly use fixed.

    library(stringr)
    
    codelists$Disease2 <- str_replace_all(codelists$Disease, fixed(c("Peripheral neuropathies (excluding cranial nerve and carpal tunnel syndromes)" = "Non-diabetic peripheral neuropathies (excluding cranial nerves and carpal tunnel syndrome)", "Primary Malignancy_Brain, Other CNS and Intracranial" = "Primary malignancy brain, other CNS and intracranial", "Venous thromboembolic disease (Excl PE)" = "Venous thromboembolism")))