Search kmers of one file in kmers of an other file and count occurences in Python

Got this function, which generates all possible kmers over the four Bases in python:

def generate_kmers(k):

    bases = ['A', 'C', 'T', 'G']  # in task (a) we only should wirte a function that generates k-mers of the four Bases
    kmer = [''.join(p) for p in itertools.product(bases, repeat=length_kmer)]
    # itertools.product returns a Cartesian product of input iterables, in our case it generates over bases and joined
    # all string combinations together over a length of k-mers
    return kmer

now what I want is, to look over a list of Sequences of a fastq file (e.g. ['GTATACACTAGTCCAGGATGTGCTTCTTGTAGAAAAGTAAAACAATGGTTAAAAGATCACAATCTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN', 'CCTGTAGAGTCATAAAGACCTCTTGGGTCCATCCTAGAAATTTTTCAGCTGAGAATAACGGGTCTGTTTCAGTTATTGCTTCTACTATNNNNNNNNNNNNNNNNNNNNNNNNNNN']) and count the occurences of all my kmers of the function generate_kmer in my list of Sequences and to save it in a dictionary. (e.g. {AAAA: 2, AAAC: 1...}) First I tried to modify generate_kmer, so that it gives all k-mers of the sequence file, and iterate over kmerSequences and kmerBases but that doesn't worked.

  • You could try this with count:

    import itertools
    mers4= generate_kmers(k)
    dcts=[{kmer:seq.count(kmer) for kmer in mers4}for seq in seqs]


    import itertools
    import re
    mers4= generate_kmers(k)
    #given sequence
    #function that returns the dictionary with ocurrences
    def dct_count(seq):
        return {mer:len(re.findall(mer, s)) for mer in mers4}