How to extract and trim the fasta sequence using biopython

Hellow everybody, I am new to python struggling to do a small task using biopython.I have two file- one containing list of ids and associated number.eg

id.txt

tr_F6LMO6_F6LMO6_9LE  25
tr_F6ISE0_F6ISE0_9LE  17
tr_F6HSF4_F6HSF4_9LE  27
tr_F6PLK9_F6PLK9_9LE  19
tr_F6HOT8_F6HOT8_9LE  29

Second file containg a large fasta sequences.eg below

fasta_db.fasta

    >tr|F6LMO6|F6LMO6_9LEHG Transporter
    MLAPETRRKRLFSLIFLCTILTTRDLLSVGIFQPSHNARYGGMGGTNLAIGGSPMDIGTN
    PANLGLSSKKELEFGVSLPYIRSVYTDKLQDPDPNLAYTNSQNYNVLAPLPYIAIRIPIT
    EKLTYGGGVYVPGGGNGNVSELNRATPNGQTFQNWSGLNISGPIGDSRRIKESYSSTFYV

   >tr|F6ISE0|F6ISE0_9LEHG peptidase domain protein OMat str.  
    MPILKVAFVSFVLLVFSLPSFAEEKTDFDGVRKAVVQIKVYSQAINPYSPWTTDGVRASS
    GTGFLIGKKRILTNAHVVSNAKFIQVQRYNQTEWYRVKILFIAHDCDLAILEAEDGQFYK

   >tr|F6HSF4|F6HSF4_9LEHG hypothetical protein,  
    MNLRSYIREIQVGLLCILVFLMSLYLLYFESKSRGASVKEILGNVSFRYKTAQRKFPDRM
    LWEDLEQGMSVFDKDSVRTDEASEAVVHLNSGTQIELDPQSMVVLQLKENREILHLGEGS


   >tr|F6PLK9|F6PLK9_9LEHG Uncharacterized protein mano str. 
   MRKITGSYSKISLLTLLFLIGFTVLQSETNSFSLSSFTLRDLRLQKSESGNNFIELSPRD
   RKQGGELFFDFEEDEASNLQDKTGGYRVLSSSYLVDSAQAHTGKRSARFAGKRSGIKISG

I wanted to match the id from the first file with second file and print those matched seq in a new file after removing the length(from 1 to 25, in eq) .

Eg output[ 25(associated value with id,first file), aa removed from start, when id matched].

fasta_pruned.fasta

>tr|F6LMO6|F6LMO6_9LEHG Transporter     
LLSVGIFQPSHNARYGGMGGTNLAIGGSPMDIGTNPANLGLSSKKELEFGVSL
PYIRSVYTDKLQDPDPNLAYTNSQNYNVLAPLPYIAIRIPITEKLTYGGGVYV
PGGGNGNVSELNRATPNGQTFQNWSGLNISGPIGDSRRIKESYSSTFYV

Biopython cookbook was way above my head being new to python programming.Thanks for any help you can give.

I tried and messed up. Here is it.

from Bio import SeqIO
from Bio import Seq

f1 = open('fasta_pruned.fasta','w')


lengthdict = dict() 
with open("seqid_len.txt") as seqlengths:
    for line in seqlengths:
        split_IDlength  = line.strip().split(' ')
        lengthdict[split_IDlength[0]] = split_IDlength[1]


with open("species.fasta","rU") as spe:
    for record in SeqIO.parse(spe,"fasta"):
        if record[0] == '>' :
            split_header = line.split('|')
            accession_ID = split_header[1]
            if accession_ID in lengthdict:
                f1.write(str(seq_record.id) + "\n")
                f1.write(str(seq_record_seq[split_IDlength[1]-1:]))



f1.close()

Solution

Your code has almost everything except for a couple of small things which prevent it from giving the desired output:

Your file id.txt has two spaces between the id and the location. You take the 2nd element which would be empty in this case.
When the file is read it is interpreted as a string but you want the position to be an integer
```
lengthdict[split_IDlength[0]] = int(split_IDlength[-1])
```

Your ids are very similar but not identical, the only identical part is the 6 character identifier which could be used to map the two files (double check that before you assume it works). Having identical keys makes mapping much easier.

f1 = open('fasta_pruned.fasta', 'w')

fasta = dict()
with open("species.fasta","rU") as spe:
    for record in SeqIO.parse(spe, "fasta"):
        fasta[record.id.split('|')[1]] = record

lengthdict = dict() 
with open("seqid_len.txt") as seqlengths:
    for line in seqlengths:
        split_IDlength  = line.strip().split(' ')
        lengthdict[split_IDlength[0].split('_')[1]] = int(split_IDlength[1])

for k, v in lengthdict.items():
    if fasta.get(k) is None:
        continue
    print('>' + k)
    print(fasta[k].seq[v:])
    f1.write('>{}\n'.format(k))
    f1.write(str(fasta[k].seq[v:]) + '\n')

f1.close()

Output:

>F6LMO6
LLSVGIFQPSHNARYGGMGGTNLAIGGSPMDIGTNPANLGLSSKKELEFGVSLPYIRSVYTDKLQDPDPNLAYTNSQNYNVLAPLPYIAIRIPITEKLTYGGGVYVPGGGNGNVSELNRATPNGQTFQNWSGLNISGPIGDSRRIKESYSSTFYV
>F6ISE0
LPSFAEEKTDFDGVRKAVVQIKVYSQAINPYSPWTTDGVRASSGTGFLIGKKRILTNAHVVSNAKFIQVQRYNQTEWYRVKILFIAHDCDLAILEAEDGQFYK
>F6HSF4
YFESKSRGASVKEILGNVSFRYKTAQRKFPDRMLWEDLEQGMSVFDKDSVRTDEASEAVVHLNSGTQIELDPQSMVVLQLKENREILHLGEGS
>F6PLK9
IGFTVLQSETNSFSLSSFTLRDLRLQKSESGNNFIELSPRDRKQGGELFFDFEEDEASNLQDKTGGYRVLSSSYLVDSAQAHTGKRSARFAGKRSGIKISG
>F6HOT8