Search code examples
rdataframetext

Duplicate words within a string by variable amount


I have a dataframe where one column contains an ID, and the other column contains a series of words with numbers associated with them. As and example:

ColA   ColB
 A     Banana:2;Kiwi:4;Apple:1
 B     Kiwi:1;Orange:4

I would like to duplicate each of the words in ColB by the number associated with it. So, the output I'm looking for would be:

ColA   ColB
 A     Banana;Banana;Kiwi;Kiwi;Kiwi;Kiwi;Apple
 B     Kiwi;Orange;Orange;Orange;Orange

Any suggestions or solutions would be greatly appreciated! Thanks


Solution

  • cbind(df, C = sapply(strsplit(df$ColB, ';'), 
        \(x)paste(do.call(rep, unname(read.table(text=x, sep=':'))), collapse = ';')))
    
      ColA                    ColB                                       C
    1    A Banana:2;Kiwi:4;Apple:1 Banana;Banana;Kiwi;Kiwi;Kiwi;Kiwi;Apple
    2    B         Kiwi:1;Orange:4        Kiwi;Orange;Orange;Orange;Orange
    

    df %>%
       separate_longer_delim(ColB, delim = ';')%>%
       separate_wider_delim(ColB, names = c('x','times'), delim = ':')%>%
       uncount(as.numeric(times))%>%
       summarise(ColB = str_c(x, collapse = ';'), .by = ColA)
    
    # A tibble: 2 × 2
      ColA  ColB                                   
      <chr> <chr>                                  
    1 A     Banana;Banana;Kiwi;Kiwi;Kiwi;Kiwi;Apple
    2 B     Kiwi;Orange;Orange;Orange;Orange  
    

    transform(df, ColB = gsubfn::gsubfn("(\\w+):(\\d+)", \(x,y)paste(rep(x,as.numeric(y)), collapse = ";"), ColB))
      ColA                                    ColB
    1    A Banana;Banana;Kiwi;Kiwi;Kiwi;Kiwi;Apple
    2    B        Kiwi;Orange;Orange;Orange;Orange