Search code examples
rregexstringstrsplit

Split a string without considering special characters


I need a way to split a string every n letters.

For example, let s="QW%ERT%ZU%I%O%P" and n=3, I want to obtain "QW%E" "RT%Z" "U%I%O" "%P".

As you can see, the special character "%" is not considered in the division.

I tried with

strsplit(s, "(?<=.{10})(?=.*\\%)", perl = TRUE)[[1]]

but I cannot find a way to obtain what I want.


Solution

  • What about regmatches (instead of strsplit) like below?

    > n <- 3
    
    > regmatches(s, gregexpr(sprintf("(\\W?\\w){1,%i}", n), s))
    [[1]]
    [1] "QW%E"  "RT%Z"  "U%I%O" "%P"
    

    Or tapply + strsplit

    v <- unlist(strsplit(s, ""))
    l <- which(grepl("\\w", v))
    tapply(
        v,
        cumsum(seq_along(v) %in% (1 + by(l, ceiling(seq_along(l) / n), max))),
        paste0,
        collapse = ""
    )
    

    which gives

          0       1       2       3
     "QW%E"  "RT%Z" "U%I%O"    "%P"