Im using Biopython to try to retrieve the DNA sequence corresponding to protein of which I have a GI(71743840), from the NCBI page this is very easy, I just need to look for the refseq. My problem comes when coding it in python, using ncbi fetch utilities, I can't find a way to retrieve any field that would help me to go to DNA.
handle = Entrez.efetch(db="nucleotide", id=blast_record.alignments[0].hit_id, rettype="gb", retmode="text")
seq_record=SeqIO.read(handle,"gb")
There is a lot of information in seq_record.features, but there must be an easier and obvious way to do this, any help would be appreciated. Thnx!
You can try to access the annotations of the SeqRecord:
seq_record=SeqIO.read(handle,"gb")
nucleotide_accession = seq_record.annotations["db_source"]
In your case nucleotide_accession
is "REFSEQ: accession NM_000673.4"
Now look if you can parse those annotations. With only this test case:
nucl_id = nucleotide_accession.split()[-1]
handle = Entrez.efetch(db="nucleotide",
id=nucl_id,
rettype="gb",
retmode="text")
seq_record = SeqIO.read(handle, "gb")