I have tried to use the answer in this post using base R: Splitting a string into new rows in R
split_dfcheck <- as.data.frame(do.call(rbind, apply(test, 1, function(x) {
do.call(expand.grid, strsplit(x, " *; *"))
})))
however, when I use it on my example data:
test <-data.frame(name=c("Andrea;barbara;erica;Matteo","Matteo;sonia;marta","Matteo;sonia","sonia;marta;matteo" ), code=c("A", "A", "B", "B"),number=c(1,1,2,2),
surname=c("surname1;surname2;surname3;surname4", "surname1;surname2;surname3", "surname1;surname2","surname1;surname2;surname3"))
I get 38 rows while using separate_rows
I get 12 rows:
test1 <- test %>%
separate_rows(c(name,surname), , sep = ";")
Is there a way to get the same result as with separate_rows
but using base R?
thanks!
test2 <- do.call(
rbind,
apply(test, 1, \(x) as.data.frame(sapply(x, strsplit, ";")))
)
test2
# name code number surname
# 1 Andrea A 1 surname1
# 2 barbara A 1 surname2
# 3 erica A 1 surname3
# 4 Matteo A 1 surname4
# 5 Matteo A 1 surname1
# 6 sonia A 1 surname2
# 7 marta A 1 surname3
# 8 Matteo B 2 surname1
# 9 sonia B 2 surname2
# 10 sonia B 2 surname1
# 11 marta B 2 surname2
# 12 matteo B 2 surname3
Has the drawback of converting all columns to character though. If you only have character/numeric you could fix that with
numeric_columns <- Filter(\(x) is.numeric(test[[x]]), names(test))
test2[numeric_columns] <- lapply(test2[numeric_columns], as.numeric)