Search code examples
rdataframevectorstrsplit

Splitting out my data when the length of each varies


I have a string which I want to split into a dataframe. As you can see from my example and data below I want to split my current data out from a single cell into a position for each number, just using NAs where we have no value. I've tried splitting this out using a gsub but as the length varies from one to three '-' in each cell I don't get the result I am looking for. I thought using unlist(strsplit()) would be good for this but wasnt able to split the original data up correctly, again due to the differing lengths.

mydat
c("1-2 1-1", "1-2 1-1 3-3", "1-1 2-1 4-1", "1-1")

newdat
one   two   three     four    five    six
1     2      1        1       NA       NA
1     2      1        1       3        3
1     1      2        1       4        1
1     1      NA       NA      NA       NA

Solution

  • Using strsplit and adapting the `length<-`s.

    > fn <- \(x) {
    +   s <- strsplit(x, '-|\\s')
    +   t(sapply(s, `length<-`, max(lengths(s))) )|> 
    +     as.data.frame() |> type.convert(as.is=TRUE)
    + }
    > fn(x) |> setNames(c('one', 'two', 'three', 'four', 'five', 'six'))
      one two three four five six
    1   1   2     1    1   NA  NA
    2   1   2     1    1    3   3
    3   1   1     2    1    4   1
    4   1   1    NA   NA   NA  NA
    

    Data:

    > dput(x)
    c("1-2 1-1", "1-2 1-1 3-3", "1-1 2-1 4-1", "1-1")