Search code examples
pythontype-conversionbiopython

User input to check a DNA sequence for restriction sites with BioPython


I wish to write a script that accepts a user inputted restriction enzyme name (therefore a string) and parses a given DNA sequence (also a string) for instances of the restriction enzyme sequence. The input would access the library of restriction enzymes contained in the Bio.Restriction module. A very simple example:

from Bio.Restriction import *

sequence=('ACGGCTATCGATAACTG...')
enzyme=input('Enter the name of your restriction enzyme: ')
enzymeSite=Bio.Restriction.enzyme.site
enzymeSite in sequence
#True or False

The problem, of course, is that the variable enzyme is a string object, not the RestrictionType object required to access the class.

type(enzyme)
<class 'str'>

type(Bio.Restriction.EcoRI)
RestrictionType

I tried using the importlib package. The enzymes appear to be classes instead of modules, however, so importlib can't help.

i=importlib.import_module('Bio.Restriction',fromlist=[''])
dir(i)
#list of Bio.Restriction contents

i=importlib.import_module('Bio.Restriction.EcoRI',fromlist=[''])
Traceback (most recent call last):
  File "<pyshell#391>", line 1, in <module>
    i=__import__('Bio.Restriction.EcoRI',fromlist=[''])
ImportError: No module named 'Bio.Restriction.EcoRI'

I'm also fairly new to Python, so I didn't get too much from reading the Restriction source files.

There are obvious limitations to being forced to access restriction enzyme in the command line. One solution to this problem is to have two python scripts where one prompts the user for the enzyme and then subsequently replaces code and imports output from another script. Another solution is to simply create a dictionary of all possible restriction enzymes and their sites. Both of these solutions are hideous. The ideal solution for me would be to convert the user inputted string into the proper RestrictionType object which could then be used to access the sites. Thanks for reading, and I would appreciate any help with this problem.


Solution

  • I've never used Biopython before, but I'm interested in bioinformatics, so I looked into it a bit. I can't guarantee that this is the optimal way to do this, seeing as it's rather strange, but it seems to work.

    I made a RestrictionBatch and added the enzyme to the batch as a string, before using the batch.get() to retrieve the RestrictionType object, using the very same string as the query. I've understood that the enzyme names are very case sensitive, I used EcoRI to test. I worked with your example:

    from Bio.Restriction.Restriction import RestrictionBatch
    
    sequence=('ACGGCGAATTCTATCGATAACTG...')
    
    # Read enzyme name from input. 
    enzyme_name = input("Enter enzyme name:\n") # E.g EcoRI
    print (type(enzyme_name)) # <type 'str'>
    
    # Get RestrictionType by name
    batch = RestrictionBatch()
    batch.add(enzyme_name)
    enzyme = batch.get(enzyme_name)
    print (type(enzyme)) # RestrictionType
    
    print (enzyme.site in sequence) # True
    

    Is this along the lines of what you're looking for?