Search code examples
pythoncommand-linebiopython

Is there a way to alter my command to reformat my output file?


I have a blast file produced. I executed a blast(x) command outputting both "qeseqid" and "sseqid":

QRv313_NP342_d0_h2_l9    YN13213
QRv313_NP9080_d0_h1_l1   YN5345
QRv313_NP123_d0_h1_l7    YN756
QRv313_NP123_d0_h1_l113  YN9768
QRv313_NP654_d0_h2_l6    YN432
QRv313_NP8_d0_h1_l1      YN3242
QRv313_NP756_d0_h1_l2    YN85686

I have written a command in nano within command-line to obtain the following desired output:

NP342    YN13213
NP9080   YN5345
NP123    YN756
NP123    YN9768
NP654    YN432
NP8_d0   YN3242
NP756    YN85686

I have written a nano script to provide me a tab delimited column of my query and subject id. I am just having trouble moving forward from here. I am unsure as to how I would modify my script to provide me with my desired output.

import sys
file_object = open(sys.argv[1])

for my_data in file_object:

  list =  my_data.split("\t")

  print (list [0], list [1])

Is there a way to alter my command so I can receive the desired output?

Any suggestions would be kindly appreciated!


Solution

  • You can try:

    import sys
      
    with open(sys.argv[1]) as file_object:
        for my_data in file_object:
            a_list = my_data.split('\t')
            print(a_list[0].split('_')[1], a_list[1], sep='\t', end='')
    

    list is a built-in type (do not use it as a name). The above code splits your data on \t and then the first field on _. It then prints the desired data delimited by \t (end='' is included to avoid printing a second newline).