How to select the number in a text? (R)

How to select the number in a text?

I want to convert the Latin number and English numbers in the text. For example, in the text "……one telephone……". I want to change the English number "one" into "1", but I do not want to change "telephone" into "teleph1".

Is it right to select only the number with a space ahead of the word and a space after it? How to do it?

Solution

To avoid replacing parts of other words into numbers you can include word boundaries in the search pattern. Some functions have a dedicated option for this but generally you can just use the special character \\b to indicate a word boundary as long as the function supports regular expressions.

For example, \\bone\\b will only match "one" if it is not part of another word. That way you can apply it to your character string "……one telephone……" without having to rely on spaces as delimiter between words.

With the stringr package (part of the Tidyverse), the replacement might look like this:

# define test string
x <- "……one telephone……"

# define dictionary for replacements with \\b indicating word boundaries
dict <- c(
  "\\bone\\b" = "1",
  "\\btwo\\b" = "2",
  "\\bthree\\b" = "3"
)

# replace matches in x
stringr::str_replace_all(x, dict)
#> [1] "……1 telephone……"

^{Created on 2022-11-11 with reprex v2.0.2}