Search code examples
rdistancedna-sequence

How to determine the average distance between known motif in a list of DNA sequences


So there is my problem : I am searching for the average distance between a known motif inside sequence, and extend this to a list of sequences... The first part is done, the second part (extend to a list of sequences) is the problematic one ! So, here the way i am doing the first part :

source("motifOccurrence.R") #https://www.r-bloggers.com/calculate-the-average-distance-between-a-given-dna-motif-within-dna-sequences-in-r/
library("seqinr")
df <- readDNAStringSet("X.fasta")
df2 <- df[[1]]
motif <- c("T", "C", "C", "A")
coord <- coordMotif(df2, motif)
motidist <- computeDistance(coord)
motidist

[1] 152

It's appear that the first sequence of my fasta list have an average distance of 152 nucleotides between two TCCA motifs. And, i don't know how automatize this to all my list in df...

Thanks by advance for the help.

Kévin


Solution

  • This is untested, but should work. sapply "climbs" each list element (we could also use lapply here).

    sapply(df, FUN = function(x, motif) {
      computeDistance(coordMotif(x, motif))
    }, motif = motif)
    

    The result will be a vector. If you would like to keep it a list, use sapply(..., simplify = FALSE). Simplification is not done with lapply. Consider either behavior as a convenience. :)