So there is my problem : I am searching for the average distance between a known motif inside sequence, and extend this to a list of sequences... The first part is done, the second part (extend to a list of sequences) is the problematic one ! So, here the way i am doing the first part :
source("motifOccurrence.R") #https://www.r-bloggers.com/calculate-the-average-distance-between-a-given-dna-motif-within-dna-sequences-in-r/
library("seqinr")
df <- readDNAStringSet("X.fasta")
df2 <- df[[1]]
motif <- c("T", "C", "C", "A")
coord <- coordMotif(df2, motif)
motidist <- computeDistance(coord)
motidist
[1] 152
It's appear that the first sequence of my fasta list have an average distance of 152 nucleotides between two TCCA motifs. And, i don't know how automatize this to all my list in df...
Thanks by advance for the help.
Kévin
This is untested, but should work. sapply
"climbs" each list element (we could also use lapply
here).
sapply(df, FUN = function(x, motif) {
computeDistance(coordMotif(x, motif))
}, motif = motif)
The result will be a vector. If you would like to keep it a list, use sapply(..., simplify = FALSE)
. Simplification is not done with lapply
. Consider either behavior as a convenience. :)