Search code examples
rstringdataframedplyrmutate

If string detected in df, create new column with replacement string + R + dplyr


I'm using the mutate method to find strings in a column (Name in the example)and replace them with a corrected string in R, which works well both on partial and full strings.

Method:

df <- data.frame(Name = c("Jim","Bob","Sue","Sally","Jimmm","Boob","Suezi","Sallyyyy","Jim","Bob","Sue","Sally"),
Period = c("P1","P1","P1","P1","P2","P2","P2","P2","P3","P3","P3","P3"),
Value = c(150, 200, 325, 120, 760,245,46,244,200, 325, 120, 760))

df <- df %>% 
  mutate(Name = case_when(
  str_detect(Name, "Jim") ~ "Jim",
str_detect(Name, "Sue") ~ "Sue",
  TRUE ~ Name)) %>%
  mutate(across(Name, str_replace, "Sallyyyy", "Sally")) 

In my real application I realized I should probably maintain the original column for reference and and create a new column with the corrections.

I tried simply adding a new column the standard way in r, as below:

df$test <- df %>% 
  mutate(Name = case_when(
  str_detect(Name, "Jim") ~ "Jim",
  TRUE ~ Name)) %>%
  mutate(across(Name, str_replace, "Sallyyyy", "Sally")) 

but instead of just creating a new column called test, in this case it creates a copy of the entire dataframe.

Is there a method within the mutate function that will allow me to create a new column with the correction as opposed to replacing it in the original column?


Solution

  • I realized I'm over complicating this, the simple solution here is to just create a copy of the column and apply the correction to the copied column, like so:

    df$Name_Correct <- df$Name
    

    Full solution:

    df <- data.frame(Name = c("Jim","Bob","Sue","Sally","Jimmm","Boob","Suezi","Sallyyyy","Jim","Bob","Sue","Sally"),
                     Period = c("P1","P1","P1","P1","P2","P2","P2","P2","P3","P3","P3","P3"),
                     Value = c(150, 200, 325, 120, 760,245,46,244,200, 325, 120, 760))
    
    df$Name_Correct <- df$Name
    
    df <- df %>% 
      mutate(Name_Correct = case_when(
        str_detect(Name_Correct, "Jim") ~ "Jim",
        str_detect(Name_Correct, "Sue") ~ "Sue",
        TRUE ~ Name_Correct)) %>%
      mutate(across(Name_Correct, str_replace, "Sallyyyy", "Sally"))