Example data file (csv format)
testdf <- read.csv("example.csv")
I am trying to automate some roster-mining. At one point I need to split rows based on names with separators, so cSplit from splitstackshape is perfect. I am also preceding and following the split with a bunch of dplyr data shaping.
loaded libraries:
library(data.table)
library(splitstackshape)
library(tidyr)
library(dplyr)
The problem is that when I load dplyr after data.frame, I get the following message:
Attaching package: ‘dplyr’
The following objects are masked from ‘package:data.table’:
between, last
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Then when I try to use cSplit:
test <- cSplit(testdf, "Registrar", "/", direction = "long")
I get this error:
Error in `[.tbl_df`(indt, , splitCols, with = FALSE) :
unused argument (with = FALSE)
I have tried various permutations - this error only occurs when both data.frame and dplyr are loaded (in either order), and restarting R without dplyr or never loading it makes cSplit work properly.
I need to be able to use both at the same time though, and detaching dplyr doesn't help (just throws up missing dplyr errors).
I have seen this thread but they seem to have come to the conclusion the data is corrupted. This seems likely because if I run on a toy data set,
Name <- "Bo / Ashley"
Date <- "2015-02-04"
testdf2 <- data.frame(Name, Date)
testtoy <- cSplit(testdf2, "Name", "/", direction = "long")
it works fine. But I have no idea how to fix this "corruption".
I haven't updated the functions in "splitstackshape" to work with tbl_df
objects. As such, the current workaround would be to add a data.frame
in your chain.
Compare:
library(splitstackshape)
library(dplyr)
CT <- tbl_df(head(concat.test))
CT %>% cSplit("Likes")
# Error in `[.tbl_df`(indt, , splitCols, with = FALSE) :
# unused argument (with = FALSE)
CT %>% data.frame %>% cSplit("Likes")
# Name Siblings Hates Likes_1 Likes_2 Likes_3 Likes_4 Likes_5
# 1: Boyd Reynolds , Albert , Ortega 2;4; 1 2 4 5 6
# 2: Rufus Cohen , Bert , Montgomery 1;2;3;4; 1 2 4 5 6
# 3: Dana Pierce 2; 1 2 4 5 6
# 4: Carole Colon , Michelle , Ballard 1;4; 1 2 4 5 6
# 5: Ramona Snyder , Joann , 1;2;3; 1 2 5 6 NA
# 6: Kelley James , Roxanne , 1;4; 1 2 5 6 NA
Alternatively, since with = FALSE
is an argument for use in "data.table", you can use tbl_dt
instead of tbl_df
objects:
CT2 <- tbl_dt(head(concat.test))
CT2 %>% cSplit("Likes")
# Name Siblings Hates Likes_1 Likes_2 Likes_3 Likes_4 Likes_5
# 1: Boyd Reynolds , Albert , Ortega 2;4; 1 2 4 5 6
# 2: Rufus Cohen , Bert , Montgomery 1;2;3;4; 1 2 4 5 6
# 3: Dana Pierce 2; 1 2 4 5 6
# 4: Carole Colon , Michelle , Ballard 1;4; 1 2 4 5 6
# 5: Ramona Snyder , Joann , 1;2;3; 1 2 5 6 NA
# 6: Kelley James , Roxanne , 1;4; 1 2 5 6 NA
Of course, if someone create a pull request that solves the issue, I would be more than happy to make the relevant updates :-)