Search code examples
regexrparsingstrsplit

Why does strsplit return a list


Consider

text <- "who let the dogs out"
fooo <- strsplit(text, " ")
fooo
[[1]]
[1] "who"  "let"  "the"  "dogs" "out" 

the output of strsplit is a list. The list's first element then is a vector, that contains the words above.

Why does the function behave that way? Is there any case in which it would return a list with more than one element?

And I can access the words using

fooo[[1]][1]
[1] "who"

, but is there no simpler way?


Solution

  • To your first question, one reason that comes to mind is so that it can keep different length result vectors in the same object, since it is vectorized over x:

    text <- "who let the dogs out"
    vtext <- c(text, "who let the")
    ##
    > strsplit(text, " ")
    [[1]]
    [1] "who"  "let"  "the"  "dogs" "out" 
    
    > strsplit(vtext, " ")
    [[1]]
    [1] "who"  "let"  "the"  "dogs" "out" 
    
    [[2]]
    [1] "who" "let" "the"
    

    If this were to be returned as a data.frame, matrix, etc... instead of a list, it would have to be padded with additional elements.