Search code examples
rtidyrstringrstringi

string count all strings giving incorrect answer in R


      A<-  c('C-C-C','C-C', 'C-C-C-C')

      library(stringr)
      B<- str_count(A, "C-C")
      df<- data.frame(A,B)

     A        B (expected)   B(actual) 
   C-C-C      2              1
   C-C        1              1
   C-C-C-C    3              2

I am trying to count all the transitions, however, I am getting the wrong answer. Can someone suggest how to fix this?


Solution

  • You expect that the strings are allowed to overlap, what is not the case. For that you need to make a Lookahead.

    str_count(A, "C(?=-C)")
    #[1] 2 1 3
    

    or count the -:

    str_count(A, "-")
    #[1] 2 1 3
    

    or in base:

    lengths(gregexpr("C(?=-C)", A, perl=TRUE))
    #[1] 2 1 3