Search code examples
rregexgsubregex-groupdata-masking

replace number greater than 5 digits in a text


a <- c("this is a number 9999333333 and i got 12344")

How could i replace the number greater than 5 digits with the extra digits being "X"

Expected Output:

"this is a number 99993XXXXX and i got 12344"

Code i tried:

gsub("(.{5}).*", "X", a)

Solution

  • You can use gsub with a PCRE regex:

    (?:\G(?!^)|(?<!\d)\d{5})\K\d
    

    See the regex demo. Details:

    • (?:\G(?!^)|(?<!\d)\d{5}) - the end of the previous successful match (\G(?!^)) or (|) a location not preceded with a digit ((?<!\d)) and then any five digits
    • \K - match reset operator discarding all text matched so far
    • \d - a digit.

    See the R demo:

    a <- c("this is a number 9999333333 and i got 12344")
    gsub("(?:\\G(?!^)|(?<!\\d)\\d{5})\\K\\d", "X", a, perl=TRUE)
    ## => [1] "this is a number 99993XXXXX and i got 12344"