Search code examples
rstringsplitmultiple-columns

Split unequally occurring comma-separated strings to columns and fill with missing values


I have a data frame with comma-separated strings:

df <- data.frame(x = c("a,b,c", "a", "a,b"))

I'd like to split the strings into separate columns, resulting in 3 new columns. For the rows with fewer than 3 strings, columns should be filled with missing values.

What I have tried so far is to use the strsplit command:

dfb <- strsplit(df, ",")

Returns an error:

non-character argument

I have also tried separate, and this would provide the additional "fill right" feature:

dfnew2 <- separate(df, c("X","Y"), sep = ",", fill = "right")

This Returns Error:

var must evaluate to a single number or a column name, not a character vector

My expected result should be a data frame like:

X Y   Z
a b   c
a n/a n/a
a b   n/a

Do you have any suggestions? Many thanks!


Solution

  • Use read.table:

    read.table(text = as.character(df$x), sep = ",", as.is = TRUE, fill = TRUE,
      na.strings = "")
    

    giving:

      V1   V2   V3
    1  a    b    c
    2  a <NA> <NA>
    3  a    b <NA>