I do have a data frame where I need to edit the Diseases names. Each Disease has several rows related to it. For some reason, when I use str_replace_all
, the replacement does not happen for two conditions ("Peripheral neuropathies (excluding cranial nerve and carpal tunnel syndromes)", "Venous thromboembolic disease (Excl PE)")
. There is no warning or error message in the output, so I can't figure out what is the issue. Does anyone have any ideas?
codelists <- data.frame(Disease = sample(c("Peripheral neuropathies (excluding cranial nerve and carpal tunnel syndromes)", "Primary Malignancy_Brain, Other CNS and Intracranial", "Venous thromboembolic disease (Excl PE)"), 15, replace = T), Codes = 1:15)
## Sort the dataframe according to Disease
codelists <- codelists[order(codelists$Disease), ]
library(stringr)
codelists$Disease2 <- str_replace_all(codelists$Disease, c("Peripheral neuropathies (excluding cranial nerve and carpal tunnel syndromes)" = "Non-diabetic peripheral neuropathies (excluding cranial nerves and carpal tunnel syndrome)", "Primary Malignancy_Brain, Other CNS and Intracranial" = "Primary malignancy brain, other CNS and intracranial", "Venous thromboembolic disease (Excl PE)" = "Venous thromboembolism"))
Thanks.
In regex
chaarcters like *
, (
have special meaning. str_replace_all
by default uses regex replacement. Since you want to match words like "(excluding cranial nerve and carpal tunnel syndromes)"
exactly use fixed
.
library(stringr)
codelists$Disease2 <- str_replace_all(codelists$Disease, fixed(c("Peripheral neuropathies (excluding cranial nerve and carpal tunnel syndromes)" = "Non-diabetic peripheral neuropathies (excluding cranial nerves and carpal tunnel syndrome)", "Primary Malignancy_Brain, Other CNS and Intracranial" = "Primary malignancy brain, other CNS and intracranial", "Venous thromboembolic disease (Excl PE)" = "Venous thromboembolism")))