Search code examples
pythonbiopythongetopt

Issue handling file from command line with biopython SeqIO


This is my first attempt at using commandline args other than the quick and dirty sys.argv[] and writing a more 'proper' python script. For some reason that I can now not figure out, it seems to be objecting to how I'm trying to use the input file from the command line.

The script is meant to take an input file, some numerical indices, and then slice out a subset region of the file, however I keep getting errors that the variable I've given to the file I'm passing in is not defined:

joehealey@7c-d1-c3-89-86-2c:~/Documents/Warwick/PhD/Scripts$ python slice_genbank.py --input PAU_06042014.gbk -o test.gbk -s 3907329 -e 3934427
Traceback (most recent call last):
  File "slice_genbank.py", line 70, in <module>
    sub_record = record[start:end]
NameError: name 'record' is not defined

Here's the code, where am I going wrong? (I'm sure its simple):

#!/usr/bin/python

# This script is designed to take a genbank file and 'slice out'/'subset'
# regions (genes/operons etc.) and produce a separate file.

# Based upon the tutorial at http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc44

# Set up and handle arguments:
from Bio import SeqIO
import getopt


def main(argv):
    record = ''
    start = ''
    end = ''
    try:
        opts, args = getopt.getopt(argv, 'hi:o:s:e:', [
                                                   'help',
                                                   'input=',
                                                   'outfile=',
                                                   'start=',
                                                   'end='
                                                   ]
                              )
        if not opts:
            print "No options supplied. Aborting."
            usage()
            sys.exit(2)
    except getopt.GetoptError:
        print "Some issue with commandline args.\n"
        usage()
        sys.exit(2)

    for opt, arg in opts:
        if opt in ("-h", "--help"):
            usage()
            sys.exit(2)
        elif opt in ("-i", "--input"):
            filename = arg
            record = SeqIO.read(arg, "genbank")
        elif opt in ("-o", "--outfile"):
            outfile = arg
        elif opt in ("-s", "--start"):
            start = arg
        elif opt in ("-e", "--end"):
            end = arg
    print("Slicing " + filename + " from " + str(start) + " to " + str(end))

def usage():
    print(
"""
This script 'slices' entries such as genes or operons out of a genbank,
subsetting them as their own file.

Usage:
python slice_genbank.py -h|--help -i|--input <genbank> -o|--output <genbank> -s|--start <int> -e|--end <int>"

Options:

-h|--help       Displays this usage message. No options will also do this.
-i|--input      The genbank file you which to subset a record from.
-o|--outfile    The file name you wish to give to the new sliced genbank.
-s|--start      An integer base index to slice the record from.
-e|--end        An integer base index to slice the record to.
"""
      )

#Do the slicing
sub_record = record[start:end]
SeqIO.write(sub_record, outfile, "genbank")

if __name__ == "__main__":
 main(sys.argv[1:])

It's also possible there's an issue with the SeqIO.write syntax, but I haven't got as far as that yet.

EDIT:

Also forgot to mention that when I use `record = SeqIO.read("file.gbk", "genbank") and write the file name directly in to the script, it works correctly.


Solution

  • As said in the comments, your variable records is only defined in the method main() (the same is true for start and end), thus it is not visible for the rest of the program. You can either return the values like this:

    def main(argv):
        ...
        ...
        return record, start, end
    

    Your call to main() can then look like this:

    record, start, end = main(sys.argv[1:])
    

    Alternatively, you can move your main functionality into the main function (as you did).

    (Another way is to define the variables in the main program and the use the global keyword in your function, this is, however, not recommended.)