Is there a way I can find number of variable sites for a pair of nucleotide sequences using R and also which sites are variable ?
For eg.
Seq1=ATGCCCCGTAATGC
Seq2=AGGCAACGTAAAGC
I want the number of variable sites as 5 and the positions 2,3,5,6,12 as variable. Is there a way to do this in R ?
Not sure whether Biostrings have any benefit on memory optimization. You simply use strsplit for short sequences as:
Seq1="ATGCCCCGTAATGC"
Seq2="AGGCAACGTAAAGC"
a <- unlist(strsplit(Seq1, ""))
b <- unlist(strsplit(Seq2, ""))
which(a!=b)
#[1] 2 5 6 12