Search code examples
pythonbioinformaticsdna-sequence

returns the location of the first item for the whole list instead of each item's location?


this code is supposed to read a text file of a genome, and given a pattern, should return how many times the pattern occurred, and its location. instead, it returns the number of occurrences and the location of the first occurrence only. this is an example of running the code instead of returning the location of the 35 occurrences, it returns the first location 35 times.

# open the file with the original sequence
myfile = open('Vibrio_cholerae.txt')

# set the file to the variable Text to read and scan
Text = myfile.read()

# insert the pattern
Pattern = "TAATGGCT"

PatternLocations = []

def PatternCount(Text,Pattern):
    count = 0
    for i in range (len(Text)-len(Pattern)+1):
        if Text [i:i+len(Pattern)] == Pattern:
            count +=1
            PatternLocations.append(Text.index(Pattern))
    return count


# print the result of calling PatternCount on Text and Pattern.
print (f"Number of times the Pattern is repeated: {PatternCount(Text,Pattern)} time(s).")
print(f"List of Pattern locations: {PatternLocations}")

Solution

  • You did

    PatternLocations.append(Text.index(Pattern))
    

    .index with single argument does

    Return the lowest index in S where substring sub is found
    

    you should do

    PatternLocations.append(i)
    

    as you does find location yourself without using index but using

    if Text [i:i+len(Pattern)] == Pattern: