I am trying to create new variable using mutate from column. I am using strsplit to perform some string operation to create new variable contents. But it is not working as desired.
myDF = data.frame(filename = c("T9719178_Mazda_20230415",
"T9719179_Mazda_20230415",
"T9719180_Mazda_20230415",
"T9719001_Tesla_20230415",
"T9719002_Tesla_20230415",
"T9719003_Tesla_20230415"))
myDF%>%
mutate(LotCode = as.factor(strsplit(filename,"_",fixed=TRUE)[[1]][1]))%>%
mutate(ModelName = as.factor(strsplit(filename,"_",fixed=TRUE)[[1]][2]))
Current output
filename LotCode ModelName
T9719178_Mazda_20230415 T9719178 Mazda
T9719179_Mazda_20230415 T9719178 Mazda
T9719180_Mazda_20230415 T9719178 Mazda
T9719001_Tesla_20230415 T9719178 Mazda
T9719002_Tesla_20230415 T9719178 Mazda
T9719003_Tesla_20230415 T9719178 Mazda
Expected output
filename LotCode ModelName
T9719178_Mazda_20230415 T9719178 Mazda
T9719179_Mazda_20230415 T9719179 Mazda
T9719180_Mazda_20230415 T9719180 Mazda
T9719001_Tesla_20230415 T9719001 Tesla
T9719002_Tesla_20230415 T9719002 Tesla
T9719003_Tesla_20230415 T9719003 Tesla
I suppose above should be straight forward but not sure what is wrong. Any suggestions?
The issue is that for each row in your dataset you are extracting the same (i.e., the first) element of the list returned by strsplit()
.
To rectify your code, you can use map()
(or apply()
) to extract the first and second elements from each element of the list:
library(tidyverse)
myDF %>%
mutate(LotCode = map(strsplit(filename, "_", fixed = TRUE), 1)) %>%
mutate(ModelName = map(strsplit(filename, "_", fixed = TRUE), 2))
#> filename LotCode ModelName
#> 1 T9719178_Mazda_20230415 T9719178 Mazda
#> 2 T9719179_Mazda_20230415 T9719179 Mazda
#> 3 T9719180_Mazda_20230415 T9719180 Mazda
#> 4 T9719001_Tesla_20230415 T9719001 Tesla
#> 5 T9719002_Tesla_20230415 T9719002 Tesla
#> 6 T9719003_Tesla_20230415 T9719003 Tesla
Created on 2023-04-15 with reprex v2.0.2
Alternatively, you could use sep_wider_delim()
instead which has the advantage of returning a vector instead of a list:
library(tidyverse)
myDF |>
separate_wider_delim(
filename,
delim = "_",
names = c("filename", "LotCode", "ModelName"),
cols_remove = FALSE
)
#> # A tibble: 6 × 3
#> filename LotCode ModelName
#> <chr> <chr> <chr>
#> 1 T9719178_Mazda_20230415 Mazda 20230415
#> 2 T9719179_Mazda_20230415 Mazda 20230415
#> 3 T9719180_Mazda_20230415 Mazda 20230415
#> 4 T9719001_Tesla_20230415 Tesla 20230415
#> 5 T9719002_Tesla_20230415 Tesla 20230415
#> 6 T9719003_Tesla_20230415 Tesla 20230415
Created on 2023-04-15 with reprex v2.0.2