I have a string of characters like this in R
ABCDE,"January 10, 2010",F,,,,GH,"March 9, 2009",,,
I would like to do something like str.split()
to partition by all combinations of commas and quotation marks into an array of strings, but keep the commas in quotation marks that represent dates so that I get:
ABCDE
January 10, 2010
F
GH
March 9, 2009
Thanks
If the pattern is as showed, then a regex option would be to create delimiter and make use of read.table
read.table(text = gsub('"', '', gsub('("[^,"]+,)(*SKIP)(*FAIL)|,',
'\n', trimws(gsub(",{2,}", ",", str1), whitespace = ","), perl = TRUE)),
header = FALSE, fill = TRUE, sep = "\n")
-output
V1
1 ABCDE
2 January 10, 2010
3 F
4 GH
5 March 9, 2009
Or with scan
data.frame(V1 = setdiff(scan(text = str1, sep = ",",
what = character()), ""))
-output
V1
1 ABCDE
2 January 10, 2010
3 F
4 GH
5 March 9, 2009
str1 <- "ABCDE,\"January 10, 2010\",F,,,,GH,\"March 9, 2009\",,,"