Search code examples
rdplyrstrsplitmutate

Issue creating new variable with mutate R using strsplit


I am trying to create new variable using mutate from column. I am using strsplit to perform some string operation to create new variable contents. But it is not working as desired.

myDF = data.frame(filename = c("T9719178_Mazda_20230415",
                           "T9719179_Mazda_20230415", 
                           "T9719180_Mazda_20230415", 
                           "T9719001_Tesla_20230415", 
                           "T9719002_Tesla_20230415", 
                           "T9719003_Tesla_20230415"))

myDF%>%
mutate(LotCode = as.factor(strsplit(filename,"_",fixed=TRUE)[[1]][1]))%>%
mutate(ModelName = as.factor(strsplit(filename,"_",fixed=TRUE)[[1]][2]))

Current output 
                    filename                LotCode     ModelName
                    T9719178_Mazda_20230415 T9719178     Mazda
                    T9719179_Mazda_20230415 T9719178     Mazda
                    T9719180_Mazda_20230415 T9719178     Mazda
                    T9719001_Tesla_20230415 T9719178     Mazda
                    T9719002_Tesla_20230415 T9719178     Mazda
                    T9719003_Tesla_20230415 T9719178     Mazda

 Expected output 

             filename       LotCode  ModelName
  T9719178_Mazda_20230415 T9719178     Mazda
  T9719179_Mazda_20230415 T9719179     Mazda
  T9719180_Mazda_20230415 T9719180     Mazda
  T9719001_Tesla_20230415 T9719001     Tesla
  T9719002_Tesla_20230415 T9719002     Tesla
  T9719003_Tesla_20230415 T9719003     Tesla

I suppose above should be straight forward but not sure what is wrong. Any suggestions?


Solution

  • The issue is that for each row in your dataset you are extracting the same (i.e., the first) element of the list returned by strsplit().

    To rectify your code, you can use map() (or apply()) to extract the first and second elements from each element of the list:

    library(tidyverse)
    
    myDF %>%
      mutate(LotCode = map(strsplit(filename, "_", fixed = TRUE), 1)) %>%
      mutate(ModelName = map(strsplit(filename, "_", fixed = TRUE), 2))
    #>                  filename  LotCode ModelName
    #> 1 T9719178_Mazda_20230415 T9719178     Mazda
    #> 2 T9719179_Mazda_20230415 T9719179     Mazda
    #> 3 T9719180_Mazda_20230415 T9719180     Mazda
    #> 4 T9719001_Tesla_20230415 T9719001     Tesla
    #> 5 T9719002_Tesla_20230415 T9719002     Tesla
    #> 6 T9719003_Tesla_20230415 T9719003     Tesla
    

    Created on 2023-04-15 with reprex v2.0.2

    Alternatively, you could use sep_wider_delim() instead which has the advantage of returning a vector instead of a list:

    library(tidyverse)
    
    myDF |> 
      separate_wider_delim(
        filename,
        delim = "_",
        names = c("filename", "LotCode", "ModelName"),
        cols_remove = FALSE
      )
    #> # A tibble: 6 × 3
    #>   filename                LotCode ModelName
    #>   <chr>                   <chr>   <chr>    
    #> 1 T9719178_Mazda_20230415 Mazda   20230415 
    #> 2 T9719179_Mazda_20230415 Mazda   20230415 
    #> 3 T9719180_Mazda_20230415 Mazda   20230415 
    #> 4 T9719001_Tesla_20230415 Tesla   20230415 
    #> 5 T9719002_Tesla_20230415 Tesla   20230415 
    #> 6 T9719003_Tesla_20230415 Tesla   20230415
    

    Created on 2023-04-15 with reprex v2.0.2