Search code examples
biopythongenbank

Writing and saving GenBank files with biobython SeqIO module


I wand to safe some DNA sequences in genbank file format to include information about genes, domains, etc. I know how to create SeqRecord objects and include all information I wand to have in the file:

#my DNA sequence and encoded protein sequence of gene1  
genome_seq = 'ATTTTGTGCAGCCGAGAGCGCGAGCGAAGCGCTTAAAAAATTCCCCCGCTCTGTTCTCCGGCAGGACACAAAGTCATGCCGTGGAGACCGCCGGTCCATAACGTGCCAGGTAGAGAGAATCAATGGTTTGCAGCGTTCTTTCACGGTCATGCTGCTTTCTGCGGGTGTGGTGACCCTGTTGGGCATCTTAACGGAAGC'  
protein_seq = 'QQRILGVKLRLLFNQVQKIQQNQDP'  
#position of gene1  
start = 12  
end = start + len(protein_seq)  
#some information  
name = 'my_contig'  
bioproject = 'BodySites'  
sample_type='blood'  
taxonomy = ['Homo Sapiens']  
reference_prot_ID = 'YP_92845z2093857'  
#dictionaries with information for SeqFeature qualifiers and SeqRecord annotations  
dict1 = {'gene':'ORF1', 'ref_ID': reference_prot_ID, 'translation':protein_seq}  
dict2 = {'SOURCE': sample_type, 'ORGANISM': 'Human', 'Taxonomy':taxonomy}  
#create SeqFeature and SeqRecord  
f1 = SeqFeature(FeatureLocation(start, end, strand=1), type='domain', qualifiers=dict1)  
my_features = [f1]  
record = SeqRecord(Seq(genome_seq, alphabet=IUPAC.unambiguous_dna), id=name, name=name\  
                   description=bioproject, annotations=dict2, features = my_features)  
print(record)  
with open('/media/sf_Desktop/test.gb', 'w') as handle:  
        SeqIO.write(record, handle, 'genbank')

What I get printed on the screend for the SeqRecord object looks like this, where everything seems to be included:

ID: my_contig
Name: ma_contig
Description: BodySites
Number of features: 1
/SOURCE=blood
/ORGANISM=Human
/Taxonomy=['Homo Sapiens']
Seq('ATTTTGTGCAGCCGAGAGCGCGAGCGAAGCGCTTAAAAAATTCCCCCGCTCTGT...AGC', IUPACUnambiguousDNA())

But in the resulting file the information on SOURCE, ORGANISM and Taxonomy is missing:

LOCUS       my_contig                198 bp    DNA              UNK 01-JAN-1980
DEFINITION  BodySites.
ACCESSION   my_contig
VERSION     my_contig
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     domain          13..37
                     /gene="ORF1"
                     /ref_ID="YP_92845z2093857"
                     /translation="QQRILGVKLRLLFNQVQKIQQNQDP"
ORIGIN
        1 attttgtgca gccgagagcg cgagcgaagc gcttaaaaaa ttcccccgct ctgttctccg
       61 gcaggacaca aagtcatgcc gtggagaccg ccggtccata acgtgccagg tagagagaat
      121 caatggtttg cagcgttctt tcacggtcat gctgctttct gcgggtgtgg tgaccctgtt
      181 gggcatctta acggaagc
//

Can anyone help me how to include also the annotation information in the output file?
I found that for the GenBank.Record module it is possible to include all information and it looks very nice on the screen, but there is no information on how to save a Record object to a file...


Solution

  • OK, I found my mistake: all annotation titles need to be in lowercase letters. So changing 'SOURCE to 'source', 'ORGANISM' to 'organism' and so on, did the job.

    Cheers!