I got a problem, I explain the point.
I have one fasta file such:
>seqA
AAAAATTTGG
>seqB
ATTGGGCCG
>seqC
ATTGGCC
>seqD
ATTGGACAG
and a dataframe :
seq name New name seq
seqB BOBO
seqC JOHN
and I simpy want to change my ID seq name in the fasta file if there is the same seq name in my dataframe and change it to the new name seq, it would give:
New fasta fil:
>seqA
AAAAATTTGG
>BOBO
ATTGGGCCG
>JOHN
ATTGGCC
>seqD
ATTGGACAG
Thank you very much
edit: I used this script:
blast=pd.read_table("matches_Busco_0035_0042.m8",header=None)
blast.columns = ["qseqid", "Busco_ID", "pident", "length", "mismatch", "gapopen","qstart", "qend", "sstart", "send", "evalue", "bitscore"]
repl = blast[blast.pident > 95]
print(repl)
#substituion dataframe
newfile = []
count = 0
for rec in SeqIO.parse("concatenate_0035_0042_aa2.fa", "fasta"):
#get corresponding value for record ID from dataframe
x = repl.loc[repl.seq == rec.id, "Busco_ID"]
#change record, if not empty
if x.any():
rec.name = rec.description = rec.id = x.iloc[0]
count += 1
#append record to list
newfile.append(rec)
#write list into new fasta file
SeqIO.write(newfile, "changedtest.faa", "fasta")
#tell us, how hard you had to work for us
print("I changed {} entries!".format(count))
And I got the following error:
Traceback (most recent call last):
File "Get_busco_blast.py", line 74, in <module>
x = repl.loc[repl.seq == rec.id, "Busco_ID"]
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3614, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'seq'
It's easier to do this with something like BioPython.
First create a dictionary
names = Series(df['seq name'].values,index=df['New seq name']).to_dict()
Now iterate
from Bio import SeqIO
outs = []
for record in SeqIO.parse("orig.fasta", "fasta"):
record.id = names.get(record.id, default=record.id)
outs.append(record)
SeqIO.write(open("new.fasta", "w"), outs, "fasta")