I have the following vector v
:
c("tactagcaatacgcttgcgttcggtggttaagtatgtataatgcgcgggcttgtcgt",
"tgctatcctgacagttgtcacgctgattggtgtcgttacaatctaacgcatcgccaa",
"gtactagagaactagtgcattagcttatttttttgttatcatgctaaccacccggcg")
i'm facing a very upsetting issue here. Each element of this vector is a DNA sequence. What i want to do is split each element 2 letters by 2 and obtain the count of occurrences of each pair of letters. The desired output would be exactly this for the first element:
AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
3 2 2 4 1 0 6 3 0 6 4 7 7 2 5 4
This result is achieved easily using the function oligonucleotideFrequency. The problem is that this function will not apply over a list or a vector using sapply or lapply and i don't understand where is the problem and how to fix it.
If i do:
oligonucleotideFrequency(DNAString(v[1]), width = 2)
It works and i get this output:
AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
3 2 2 4 1 0 6 3 0 6 4 7 7 2 5 4
but if i do:
v <- DNAString(v)
lapply(v, oligonucleotideFrequency(v, width = 2)
This is what i get:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘oligonucleotideFrequency’ for signature ‘"list"
Same occurs with sapply
.
If i check the class of v
after applying the DNAString
function it returns "list"
so idon't get where is the problem here.
Even if i do:
oligonucleotideFrequency(v[1], width = 2)
it returns:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘oligonucleotideFrequency’ for signature ‘"list"’
I'm totally lost here, please help, i've been hours breaking my head into this, how can i fis this problem?? I want to apply this function to the whole vector at once.
PD: The R package containing this functions os called Biostrings
and it can be downloaded and installed from here
Thanks in advance
There are two ways to use the lapply
function.
The first one is to provide a user-defined function and set all the arguments inside the function like the following.
library(Biostrings)
v <- c("tactagcaatacgcttgcgttcggtggttaagtatgtataatgcgcgggcttgtcgt",
"tgctatcctgacagttgtcacgctgattggtgtcgttacaatctaacgcatcgccaa",
"gtactagagaactagtgcattagcttatttttttgttatcatgctaaccacccggcg")
lapply(v, function(x) oligonucleotideFrequency(DNAString(x), width = 2))
# [[1]]
# AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
# 3 2 2 4 1 0 6 3 0 6 4 7 7 2 5 4
#
# [[2]]
# AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
# 3 4 1 4 5 2 4 4 2 4 1 5 3 5 6 3
#
# [[3]]
# AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
# 2 4 4 4 3 3 2 4 2 4 1 3 7 1 3 9
The second one is to provide the function name, and provide the arguemnts like ...
as follows. For this option, the item in the list (in this case, v
), automatically goes to the first argument of the fucntion.
library(Biostrings)
v <- c("tactagcaatacgcttgcgttcggtggttaagtatgtataatgcgcgggcttgtcgt",
"tgctatcctgacagttgtcacgctgattggtgtcgttacaatctaacgcatcgccaa",
"gtactagagaactagtgcattagcttatttttttgttatcatgctaaccacccggcg")
v <- lapply(v, DNAString)
lapply(v, oligonucleotideFrequency, width = 2)
# [[1]]
# AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
# 3 2 2 4 1 0 6 3 0 6 4 7 7 2 5 4
#
# [[2]]
# AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
# 3 4 1 4 5 2 4 4 2 4 1 5 3 5 6 3
#
# [[3]]
# AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
# 2 4 4 4 3 3 2 4 2 4 1 3 7 1 3 9