Search code examples
pythonsortingblast

Sorting a .txt file by name in python


I have a huge blast output file in tabular format. I want to sort my data according to protein names, to see which seq-s align to that particular protein. Let's say I have

con19 sp|Q24K02|IDE_BOVIN 3
con19 sp|P35559|IDE_RAT   2
con15 sp|Q24K02|IDE_BOVIN 8
con15 sp|P14735|IDE_HUMAN 30
con16 sp|Q24K02|IDE_BOVIN 45
con16 sp|P35559|IDE_RAT   23

I want to get an output,both are OK

sp|Q24K02|IDE_BOVIN con19 3            sp|Q24K02|IDE_BOVIN con19 3
                    con15 8            sp|Q24K02|IDE_BOVIN con15 8
                    con16 45           sp|Q24K02|IDE_BOVIN con16 45
sp|P35559|IDE_RAT   con19 2            sp|P35559|IDE_RAT   con19 2          
                    con16 23           sp|P35559|IDE_RAT   con16 23
sp|P14735|IDE_HUMAN con15 30           sp|P14735|IDE_HUMAN con15 30



f1 = open('file.txt','r')
lines=f1.readlines()
for line in lines:
    a=sorted(lines)
    r=open('file.txt','w')
    r.writelines(a)
f1.close       

Solution

  • The problem is that you are calling sorted once for each line (i.e. inside the loop), not for the entire set of lines. Try this instead:

    f1 = open('file.txt','r')
    a=sorted(f1.readlines(), key=lambda l:l.split('|')[1])
    r=open('file.txt','w')
    r.writelines(a)
    f1.close