Search code examples
pythonlistslice

How to search a value in a list between given positions


I have a list with three values (strings) and a substring.

  1. Each string in the list needs to be searched for the given substring between position 20 and 50 and printed out if there's more than 5 occurances (of this substring in each string).

  2. If the string lacks the substring a message should be printed that the substring is missing (in each list item).

The output should be (considering my code below)

1 Enriched with SP1 binding sites
3 Contains no SP1 binding sites
seq_list = ["GGGCGGAAAAGGGCGGAAAAGGGCGGGGGCGGAAAAGGGCGGAAAAGGGCGGGGGCGGAAAAGGGCGGAAAAGGGCGGGGGCGGAAAAGGGCGGAAAAGGGCGG", "GGGCGG", "BBBBBBB"]
binding_site = "GGGCGG"


for count, value in enumerate(seq_list, start=1):               
    if binding_site in value:
        sumSP = int(sum(s.count('GGCGG')for s in seq_list))
        if sumSP >20:
            print(count, "enriched with SP1 binding sites")

else:
    print(count,"No binding sites found.")

Output

So I've got two problems. First, I've scoured the internet for a simple solution to search each string between pos 20-50 but only manage to find how to search the entire lists positions (using slice). The second problem is that my code sumSP doesn't work since it gives true for my second string which should be false, since it's only value 1 in my list that holds more than 5 binding_sites.


Solution

  • The following code closely follows your code snippet. It uses two calls to str.find() to find the binding site at all and between positions 20 and 50.

    seq_list = ["GGGCGGAAAAGGGCGGAAAAGGGCGGGGGCGGAAAAGGGCGGAAAAGGGCGGGGGCGGAAAAGGGCGGAAAAGGGCGGGGGCGGAAAAGGGCGGAAAAGGGCGG", "GGGCGG", "BBBBBBB"]
    binding_site = "GGGCGG"
    
    for count, value in enumerate(seq_list, start=1):
        if value.find(binding_site) != -1:
            if value.find(binding_site, 20, 50) != -1:
                sumSP = value.count('GGCGG')
                if sumSP >= 5:
                    print(count, "enriched with SP1 binding sites")
            else:
                print(count,"No binding sites found.")
    

    Output:

    1 enriched with SP1 binding sites
    2 No binding sites found.