Search code examples
rstringcsvarr

How to split string of characters by commas but keep dates?


I have a string of characters like this in R

ABCDE,"January 10, 2010",F,,,,GH,"March 9, 2009",,,

I would like to do something like str.split() to partition by all combinations of commas and quotation marks into an array of strings, but keep the commas in quotation marks that represent dates so that I get:

ABCDE
January 10, 2010
F
GH
March 9, 2009

Thanks


Solution

  • If the pattern is as showed, then a regex option would be to create delimiter and make use of read.table

    read.table(text = gsub('"', '', gsub('("[^,"]+,)(*SKIP)(*FAIL)|,',
       '\n', trimws(gsub(",{2,}", ",", str1), whitespace = ","), perl = TRUE)), 
        header = FALSE, fill = TRUE, sep = "\n")
    

    -output

                    V1
    1            ABCDE
    2 January 10, 2010
    3                F
    4               GH
    5    March 9, 2009
    

    Or with scan

    data.frame(V1 = setdiff(scan(text = str1, sep = ",",
        what = character()), ""))
    

    -output

                  V1
    1            ABCDE
    2 January 10, 2010
    3                F
    4               GH
    5    March 9, 2009
    

    data

    str1 <- "ABCDE,\"January 10, 2010\",F,,,,GH,\"March 9, 2009\",,,"