Search code examples
stringrsearchbioinformaticsdna-sequence

Need to count number of times sequence of letters occurs in string R


So I have a sequence of nucleotides and I need to count the number of times the word gaga appears in the sequence. This is what I have so far:

dna=c("a","g","c","t")
N=16
x=sample(dna,N,4)
x2=paste(x,collapse="")
x2

Here is an example output:

gtaggcctaattataa

Eventually, I am going to write a loop to make this run 100 times and plot a histogram of the counts of the word "gaga." So, my main question is: How can I write a function or code to search through the string x2 and count the number of occurrences of the word "gaga."

Any help would be appreciated! Thank you!


Solution

  • ?regex
    sapply( gregexpr( "gaga", c("gtaggcctaattataa", 
                                "gtaggcctaatgagaataa", 
                                "gagagaga") ) ,
            function(x) if( x[1]==-1 ){ 0 }else{ length(x) } )
    [1] 0 1 2