Search code examples
rtextreplacetmtidytext

replace range of numbers with single numbers in a character string


Is there any way to replace range of numbers wih single numbers in a character string? Number can range from n-n, most probably around 1-15, 4-10 ist also possible.

the range could be indicated with a) -

a <- "I would like to buy 1-3 cats"

or with a word b) for example: to, bis, jusqu'à

b <- "I would like to buy 1 jusqu'à 3 cats"

The results should look like

"I would like to buy 1,2,3 cats"

I found this: Replace range of numbers with certain number but could not really use it in R.


Solution

  • gsubfn in the gsubfn package is like gsub but instead of replacing the match with a replacement string it allows the user to specify a function (possibly in formula notation as done here). It then passes the matches to the capture groups in the regular expression, i.e. the matches to the parenthesized parts of the regular expression, as separate arguments and replaces the entire match with the output of the function. Thus we match "(\\d+)(-| to | bis | jusqu'à )(\\d+)" which results in three capture groups so 3 arguments to the function. In the function we use seq with the first and third of these. Note that seq can take character arguments and interpret them as numeric so we did not have to convert the arguments to numeric.

    Thus we get this one-liner:

    library(gsubfn)
    s <- c(a, b) # test input strings
    
    gsubfn("(\\d+)(-| to | bis | jusqu'à )(\\d+)", ~ paste(seq(..1, ..3), collapse = ","), s)
    

    giving:

    [1] "I would like to buy 1,2,3 cats" "I would like to buy 1,2,3 cats"