Search code examples
pythonshellfreezefasta

Python shell freezes on reading (fasta) file


I am going to start of by showing the code I have thus far:

def err(em):
    print(em)
    exit

def rF(f):
    s = ""
    try:
        fh = open(f, 'r')
    except IOError:
        e = "Could not open the file: " + f
        err(e)

    try:
        with fh as ff:
            next(ff)
            for l in ff:
                if ">" in l:
                    next(ff)
                else:
                    s += l.replace('\n','').replace('\t','').replace('\r','')
    except:
        e = "Unknown Exception"
        err(e)
    fh.close()
    return s

For some reason the python shell (I am using 3.2.2) freezes up whenever I tried to read a file by typing:

rF("mycobacterium_bovis.fasta")

The conditionals in the rF function are to prevent reading each line that starts with a ">" token. These lines aren't DNA/RNA code (which is what I am trying to read from these files) and should be ignored.

I hope anyone can help me out with this, I don't see my error.

As per the usual, MANY thanks in advance!

EDIT: *The problem persists!* This is the code I now use, I removed the error handling which was a fancy addition anyway, still the shell freezes whenever attempting to read a file. This is my code now:

def rF(f):
    s = ""
      try:
          fh = open(f, 'r')
    except IOError:
        print("Err")

    try:
        with fh as ff:
            next(ff)
            for l in ff:
                if ">" in l:
                    next(ff)
                else:
                    s += l.replace('\n','').replace('\t','').replace('\r','')
    except:
        print("Err")

    fh.close()
    return s

Solution

  • You didn't ever define e.
    So you'll get a NameError that is being hidden by the naked except:.

    This is why it is good and healthy to specify the exception, e.g.:

    try: 
        print(e)
    except NameError as e: 
        print(e)
    

    In cases like yours, though, when you don't necessarily know what the exception will be you should at least use this method of displaying information about the error:

    import sys
    try:
        print(e)
    except: # catch *all* exceptions
        e = sys.exc_info()[1]
        print(e)
    

    Which, using the original code you posted, would have printed the following:

    name 'e' is not defined
    

    Edit based on updated information:
    Concatenating a string like that is going to be quite slow if you have a large file.
    Consider instead writing the filtered information to another file, e.g.:

    def rF(f):
      with open(f,'r') as fin, open('outfile','w') as fou:
        next(fin)
        for l in fin:
          if ">" in l:
            next(fin)
          else:
            fou.write(l.replace('\n','').replace('\t','').replace('\r',''))
    

    I have tested that the above code works on a FASTA file based on the format specification listed here: http://en.wikipedia.org/wiki/FASTA_format using Python 3.2.2 [GCC 4.6.1] on linux2.

    A couple of recommendations:

    • Start small. Get a simple piece working then add a step.
    • Add print() statements at trouble spots.

    Also, consider including more information about the contents of the file you're attempting to parse. That may make it easier for us to help.