Search code examples
shellunixawkfindcut

Extract column 'x' from multiple files, and transpose file name with 'x'


I am trying to extract column "m" from multiple txt files (file1.txt, file2.txt,,,etc) and transpose each column to a row in new file.

Below is file1.txt:

contig_1    contig_1    geneX       ctg1_886;ctg1_887;ctg1_888
contig_2    contig_2    geneY       ctg1_886;ctg1_887;ctg1_888
contig_3    contig_3    genesZ      ctg1_886;ctg1_887;ctg1_888

I would like to have a summary.txt file which looks like:

file1 geneX geneY geneZ
file2 geneA geneY
.
.
.
etc. 

Total row numbers may vary between files. I tried using awk without success.


Solution

  • Following glenn jackmans advise from the comments, an GNU AWK solution would look like this:

    awk 'BEGIN {ORS=" "} BEGINFILE{print FILENAME} {print $3} ENDFILE{ printf("\n")}'  file*.txt
    

    And an awk solution could look like this (sorry only gnu awk for testing):

    awk 'BEGIN {ORS=" "} FNR==1 {printf("\n%s", FILENAME)} {print $3} END{printf("\n")} '
    

    Explanation

    There are several special patterns:

    • BEGIN, its action is executed once at the beginning. Here the ORS ( output record separator) is set to space, the effect is that you get from each original row a new column, this is the transpose step
    • the END action is executed once at the end
    • the BEGINFILE and ENDFILE actions are executed once at the beginning and end of the processing of each file. Here the FILENAME respectively a linefeed is printed.