I wand to safe some DNA sequences in genbank file format to include information about genes, domains, etc. I know how to create SeqRecord objects and include all information I wand to have in the file:
#my DNA sequence and encoded protein sequence of gene1
genome_seq = 'ATTTTGTGCAGCCGAGAGCGCGAGCGAAGCGCTTAAAAAATTCCCCCGCTCTGTTCTCCGGCAGGACACAAAGTCATGCCGTGGAGACCGCCGGTCCATAACGTGCCAGGTAGAGAGAATCAATGGTTTGCAGCGTTCTTTCACGGTCATGCTGCTTTCTGCGGGTGTGGTGACCCTGTTGGGCATCTTAACGGAAGC'
protein_seq = 'QQRILGVKLRLLFNQVQKIQQNQDP'
#position of gene1
start = 12
end = start + len(protein_seq)
#some information
name = 'my_contig'
bioproject = 'BodySites'
sample_type='blood'
taxonomy = ['Homo Sapiens']
reference_prot_ID = 'YP_92845z2093857'
#dictionaries with information for SeqFeature qualifiers and SeqRecord annotations
dict1 = {'gene':'ORF1', 'ref_ID': reference_prot_ID, 'translation':protein_seq}
dict2 = {'SOURCE': sample_type, 'ORGANISM': 'Human', 'Taxonomy':taxonomy}
#create SeqFeature and SeqRecord
f1 = SeqFeature(FeatureLocation(start, end, strand=1), type='domain', qualifiers=dict1)
my_features = [f1]
record = SeqRecord(Seq(genome_seq, alphabet=IUPAC.unambiguous_dna), id=name, name=name\
description=bioproject, annotations=dict2, features = my_features)
print(record)
with open('/media/sf_Desktop/test.gb', 'w') as handle:
SeqIO.write(record, handle, 'genbank')
What I get printed on the screend for the SeqRecord object looks like this, where everything seems to be included:
ID: my_contig
Name: ma_contig
Description: BodySites
Number of features: 1
/SOURCE=blood
/ORGANISM=Human
/Taxonomy=['Homo Sapiens']
Seq('ATTTTGTGCAGCCGAGAGCGCGAGCGAAGCGCTTAAAAAATTCCCCCGCTCTGT...AGC', IUPACUnambiguousDNA())
But in the resulting file the information on SOURCE, ORGANISM and Taxonomy is missing:
LOCUS my_contig 198 bp DNA UNK 01-JAN-1980
DEFINITION BodySites.
ACCESSION my_contig
VERSION my_contig
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
domain 13..37
/gene="ORF1"
/ref_ID="YP_92845z2093857"
/translation="QQRILGVKLRLLFNQVQKIQQNQDP"
ORIGIN
1 attttgtgca gccgagagcg cgagcgaagc gcttaaaaaa ttcccccgct ctgttctccg
61 gcaggacaca aagtcatgcc gtggagaccg ccggtccata acgtgccagg tagagagaat
121 caatggtttg cagcgttctt tcacggtcat gctgctttct gcgggtgtgg tgaccctgtt
181 gggcatctta acggaagc
//
Can anyone help me how to include also the annotation information in the output file?
I found that for the GenBank.Record module it is possible to include all information and it looks very nice on the screen, but there is no information on how to save a Record object to a file...
OK, I found my mistake: all annotation titles need to be in lowercase letters. So changing 'SOURCE to 'source', 'ORGANISM' to 'organism' and so on, did the job.
Cheers!