A<- c('C-C-C','C-C', 'C-C-C-C')
library(stringr)
B<- str_count(A, "C-C")
df<- data.frame(A,B)
A B (expected) B(actual)
C-C-C 2 1
C-C 1 1
C-C-C-C 3 2
I am trying to count all the transitions, however, I am getting the wrong answer. Can someone suggest how to fix this?
You expect that the strings are allowed to overlap, what is not the case. For that you need to make a Lookahead.
str_count(A, "C(?=-C)")
#[1] 2 1 3
or count the -
:
str_count(A, "-")
#[1] 2 1 3
or in base:
lengths(gregexpr("C(?=-C)", A, perl=TRUE))
#[1] 2 1 3