Search code examples
awksedfasta

Extract multiple columns and add null character in between


I have a file with the following format :

TRINITY_DN119001_c0_g1_i1   4   *   0   0   *   *   0   0   GAGCCTCCCTCATGAATGTACCAGCATTTACCTCATAAAGAGCT    *   XO:Z:NM 
TRINITY_DN119037_c0_g1_i1   4   *   0   0   *   *   0   0   TAAGATTAGGTTGTATTCCAG   *   XO:Z:NM 
TRINITY_DN119099_c0_g1_i1   4   *   0   0   *   *   0   0   AGGCAGGCGCTAAACGATTTGCATTTCTCTAATGATTACGCCAG    *   XO:Z:NM

I am trying to extract the 1st and 10th column and store it in the following format(output file) :

>TRINITY_DN119099_c0_g1_i1  
GAGCCTCCCTCATGAATGTACCAGCATTTACCTCATAAAGAGCT    
>TRINITY_DN119037_c0_g1_i1
TAAGATTAGGTTGTATTCCAG
>TRINITY_DN119001_c0_g1_i1  
AGGCAGGCGCTAAACGATTTGCATTTCTCTAATGATTACGCCAG

I am doing the following code for now :

cut -d "  " -f1,10 in.txt > out.txt
sed 's/^/>/' out.txt

but,unable to get how to get above output.


Solution

  • You may use awk:

    awk '{printf ">%s\n%s\n", $1, $10}' file
    

    >TRINITY_DN119001_c0_g1_i1
    GAGCCTCCCTCATGAATGTACCAGCATTTACCTCATAAAGAGCT
    >TRINITY_DN119037_c0_g1_i1
    TAAGATTAGGTTGTATTCCAG
    >TRINITY_DN119099_c0_g1_i1
    AGGCAGGCGCTAAACGATTTGCATTTCTCTAATGATTACGCCAG
    

    However note that it is 1st and 10th column in your shown output instead of 9th.