I'm using the mutate method to find strings in a column (Name
in the example)and replace them with a corrected string in R, which works well both on partial and full strings.
Method:
df <- data.frame(Name = c("Jim","Bob","Sue","Sally","Jimmm","Boob","Suezi","Sallyyyy","Jim","Bob","Sue","Sally"),
Period = c("P1","P1","P1","P1","P2","P2","P2","P2","P3","P3","P3","P3"),
Value = c(150, 200, 325, 120, 760,245,46,244,200, 325, 120, 760))
df <- df %>%
mutate(Name = case_when(
str_detect(Name, "Jim") ~ "Jim",
str_detect(Name, "Sue") ~ "Sue",
TRUE ~ Name)) %>%
mutate(across(Name, str_replace, "Sallyyyy", "Sally"))
In my real application I realized I should probably maintain the original column for reference and and create a new column with the corrections.
I tried simply adding a new column the standard way in r, as below:
df$test <- df %>%
mutate(Name = case_when(
str_detect(Name, "Jim") ~ "Jim",
TRUE ~ Name)) %>%
mutate(across(Name, str_replace, "Sallyyyy", "Sally"))
but instead of just creating a new column called test
, in this case it creates a copy of the entire dataframe.
Is there a method within the mutate function that will allow me to create a new column with the correction as opposed to replacing it in the original column?
I realized I'm over complicating this, the simple solution here is to just create a copy of the column and apply the correction to the copied column, like so:
df$Name_Correct <- df$Name
Full solution:
df <- data.frame(Name = c("Jim","Bob","Sue","Sally","Jimmm","Boob","Suezi","Sallyyyy","Jim","Bob","Sue","Sally"),
Period = c("P1","P1","P1","P1","P2","P2","P2","P2","P3","P3","P3","P3"),
Value = c(150, 200, 325, 120, 760,245,46,244,200, 325, 120, 760))
df$Name_Correct <- df$Name
df <- df %>%
mutate(Name_Correct = case_when(
str_detect(Name_Correct, "Jim") ~ "Jim",
str_detect(Name_Correct, "Sue") ~ "Sue",
TRUE ~ Name_Correct)) %>%
mutate(across(Name_Correct, str_replace, "Sallyyyy", "Sally"))