Search code examples
rregexstringdetect

Determine repeating numbers in a string in R


I am trying to identify column values in a data frame that have repeating number sequence. For instance

> df
   ColA
1 66046
2 73947
3 67456
4 67217
5 66861
6 67658

I want to return 66046, 66861 as 6 appears in succession. I have tried the following...

df %>% filter(str_detect(as.String(df[1]), "[66]"))  #with and without the squared brackets.
df[unlist(gregexpr("[6]{2}[[:digit:]]", df[1])), ][1]

Obvious to say, this doesn't work. Any help is appreciated.

Thanks


Solution

  • Use

    library(dplyr)
    library(stringr)
    df %>%
       filter(str_detect(ColA, "(\\d)\\1"))
    

    See proof

    NODE EXPLANATION
    ( group and capture to \1:
    \d digits (0-9)
    ) end of \1
    \1 what was matched by capture \1