Search code examples
rtidyversesequencegeneraterun-length-encoding

R How to Generate Sequences in a Tibble Given Start and End Points


I can't think how to do this in a tidy fashion.

I have a table as follows:

tibble( 
  Min = c(1, 5, 12, 13, 19), 
  Max = c(3, 11, 12, 14, 19), 
  Value = c("a", "bb", "c", "d", "e" ) 
)

and I want to generate another table from it as shown below

tibble(
  Row = c(1:3, 5:11, 12:12, 13:14, 19:19), 
  Value = c( rep("a", 3), rep("bb", 7), "c", "d", "d", "e" ) 
)

Grateful for any suggestions folk might have. The only 'solutions' which come to mind are a bit cumbersome.


Solution

  • 1) If DF is the input then:

    library(dplyr)
    
    DF %>%
     group_by(Value) %>%
     group_modify(~ tibble(Row = seq(.$Min, .$Max))) %>%
     ungroup
    

    giving:

    # A tibble: 14 x 2
       Value   Row
       <chr> <int>
     1 a         1
     2 a         2
     3 a         3
     4 bb        5
     5 bb        6
     6 bb        7
     7 bb        8
     8 bb        9
     9 bb       10
    10 bb       11
    11 c        12
    12 d        13
    13 d        14
    14 e        19
    

    2) This one creates a list column L containing tibbles and then unnests it. Duplicate Value elements are ok with this one.

    library(dplyr)
    library(tidyr)
    
    DF %>%
     rowwise %>%
     summarize(L = list(tibble(Value, Row = seq(Min, Max)))) %>%
     ungroup %>%
     unnest(L)