Search code examples
rstringvectorsymbolsrep

Generate strings of repeated characters according to table in R


I would like to generate a series of strings with certain number of symbols according to a table.

Given the following table:

Length    Start     End
  40       15        20
  50       18        23
  45       12        19
  40       13        18
  .        .         .
  .        .         .
  .        .         .

I would like to generate a list of strings that have the length of column1 with "." symbol, and that from Start (column2) to End (column3) position in the string, the "." symbol is changed to be "x" symbol.

So, the desired output would be a vector of strings like:

[1] "..............XXXXXX...................."
[2] ".................XXXXXX..........................."
[3] "...........XXXXXXXX.........................."
[4] "............XXXXXX......................"

Where for instance, for the first element, there are 14 ".", followed by 5 "X", followed by 20 "." again, as specified in the first row of the table.

I would like to iterate over the three columns of each row in the table, to generate a vector of strings with as many elements (strings) as rows in the table, following the speficications for n "." and m "x" symbols in each of them.

I have been testing some different approaches with rep to iterate over the table but could not reach to a working code...

Any help?


Solution

  • myfunc <- function(len, st, ed) paste(replace(rep(".", len), st:ed, "X"), collapse = "")
    mapply(myfunc, dat$Length, dat$Start, dat$End)
    # [1] "..............XXXXXX...................."           ".................XXXXXX..........................."
    # [3] "...........XXXXXXXX.........................."      "............XXXXXX......................"          
    

    Walk-through:

    • rep(".", len) starts the process, creating a vector of ".", the appropriate length

    • replace(..., st:ed, "X") takes the start:end vectors and replaces the "." with "X"

    • paste(., collapse = "") collapses the vector into a single string

    • While sapply and lapply iterate over a single vector or list, mapply (and Map) "zips" multiple same-length vectors/lists together, one by one. Our one call to mapply above is effectively the same as

      c(myfunc(dat$Length[1], dat$Start[1], dat$End[1]),
        myfunc(dat$Length[2], dat$Start[2], dat$End[2]),
        myfunc(dat$Length[3], dat$Start[3], dat$End[3]), ...)