Search code examples
rstringcsvstrsplit

How to get empty last elements from strsplit() in R?


I need to process some data that are mostly csv. The problem is that R ignores the comma if it comes at the end of a line (e.g., the one that comes after 3 in the example below).

> strsplit("1,2,3,", ",")
[[1]]
[1] "1" "2" "3"

I'd like it to be read in as [1] "1" "2" "3" NA instead. How can I do this? Thanks.


Solution

  • Here are a couple ideas

    scan(text="1,2,3,", sep=",", quiet=TRUE)
    #[1]  1  2  3 NA
    
    unlist(read.csv(text="1,2,3,", header=FALSE), use.names=FALSE)
    #[1]  1  2  3 NA
    

    Those both return integer vectors. You can wrap as.character around either of them to get the exact output you show in the Question:

    as.character(scan(text="1,2,3,", sep=",", quiet=TRUE))
    #[1] "1" "2" "3" NA 
    

    Or, you could specify what="character" in scan, or colClasses="character" in read.csv for slightly different output

    scan(text="1,2,3,", sep=",", quiet=TRUE, what="character")
    #[1] "1" "2" "3" "" 
    
    unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character"), use.names=FALSE)
    #[1] "1" "2" "3" "" 
    

    You could also specify na.strings="" along with colClasses="character"

    unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character", na.strings=""), 
           use.names=FALSE)
    #[1] "1" "2" "3" NA