Search code examples
formattingbiopython

In Biopython: Can you ‘print’ so a long output will be formatted as multiple lines each with a max char length, as often seen in the FASTA format?


I have been trying to take information from a genbank file, and print out just the locus tag and translation using the below code by xbello which I modified.

from Bio import SeqIO

for rec in SeqIO.parse("file.gb", "genbank"):
    if rec.features:
       for feature in rec.features:
           if feature.type == "CDS" and feature.qualifiers.has_key('translation'):
               print '>'+feature.qualifiers['locus_tag'][0]
               print feature.qualifiers['translation'][0]

This works however it prints out each of the translation sequences as very long lines (I assume the maximum character length python allows), I was wondering if it was possible to set it so that they would be formatted into multi-line paragraphs with about 60 characters a line, which is what you often seen in .faa files for example.

I have tried print(textwrap.fill(feature.qualifiers['translation'], width=60)) and print(textwrap.wrap(feature.qualifiers['translation'], width=60))

So far that has not worked, I have tried doing X = feature.qualifiers['translation'] and doing print(textwrap.fill(X, width=60))

But unsurprisingly the computer had no idea what I was asking it to do… I am not sure what other format commands work with print instead of Xout.write, I have a strong feeling I have not written this in a way that lets the computer know I want it to wait for the text from print feature.qualifiers['translation'] and then text wrap that with a width=60

I use cmd or powershell to run this code as a script, with ">X.xx" to give the output file name and file type.


Solution

  • You could write a custom print function which gets as input a string and splits the string into parts of 60 char and then prints those parts.

    def custom_print(string):
        counter=0
        res=""
        for char in string:
            if counter==60:
                print res
                counter=0
                res=""
                continue
            res+=char
            counter+=1