I need to analyse a large dataset. For that, I need to separate a character variable into more than a thousand columns. The structure of this variable is number$number$number$ and so on for a thousand numbers
.
My data is stored in an SQLite database. I imported it in R using RSQLite. I tried splitting this column into multiple columns using dplyr :
#d is a data.table with my data
d2=d %>% separate(column_to_separate, paste0("S",c(1:number_of_final_columns)))
It works but is taking forever. How to split this column faster (either in R or using SQLite)?
You may use the tidyfast
package (see here), that leverages on data.table
. In this test, it is approximately three times faster:
test <- data.frame(
long.var = rep(paste0("V", 1:1000, "$", collapse = ""), 1000)
)
system.time({
test |>
tidyr::separate(long.var, into = paste0("N", 1:1001), sep="\\$")
})
#> user system elapsed
#> 0.352 0.012 0.365
system.time({
test |>
tidyfast::dt_separate(long.var, into = paste0("N", 1:1001), sep="\\$")
})
#> user system elapsed
#> 0.117 0.000 0.118
Created on 2023-02-03 with reprex v2.0.2