Search code examples
rstringi

Create New Lists Based on List Structure Pattern


I have some data that looks like this:

   dat <- c("Sales","Jim","Halpert","","",
            "Reception","Pam","Beasley","","",
            "Not.Manager","Dwight","Schrute","Bears","Beets","BattlestarGalactica","","",
            "Manager","Michael","Scott","","")

Each "chunk" of data is consecutive with some blanks in between. I want to transform the data into a list of lists that looks like this:

iwant <- c(
           c("Sales","Jim","Halpert"),
           c("Reception","Pam","Beasley"),
           c("Not.Manager","Dwight","Schrute","Bears","Beets","BattlestarGalactica"),
           c("Manager","Michael","Scott")
           )

Suggestions? I am using rvest and stringi. I do not want to add more packages.


Solution

  • You can use rle, split with lapply :

    lapply(split(dat, with(rle(dat != ''), 
                 rep(cumsum(values), lengths))), function(x) x[x!= ''])
    
    #$`1`
    #[1] "Sales"   "Jim"     "Halpert"
    
    #$`2`
    #[1] "Reception" "Pam"       "Beasley"  
    
    #$`3`
    #[1] "Not.Manager"         "Dwight"    "Schrute"     "Bears"   "Beets"            
    #[6] "BattlestarGalactica"
    
    #$`4`
    #[1] "Manager" "Michael" "Scott"  
    

    rle part creates group to split on :

    with(rle(dat != ''), rep(cumsum(values), lengths))
    #[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4
    

    After split we use lapply to remove any empty elements from each list.