Search code examples
awkpastecut

Extracting same column from each file to one file


I have the following dataset with 171 files.

CHR:POS   REF:ALT   BREED

6:85406127 T:A 0.333333
6:85406128 T:C 0
6:85406129 C:G 0.333333
6:85406130 T:G 0.833333

Desired output is

CHR:POS   REF:ALT   BREED BREED2 BREED3 ... 171st file

6:85406127 T:A 0.333333 0.33 0.5 .... 0.4
6:85406128 T:C NA 0.33 0.5 .... 0.4
6:85406129 C:G 0.333333 0.33 NA .... 0
6:85406130 T:G 0.833333 0.33 0.5 .... NA

The filenames contain the breed names. The first and second columns contain the same information in every file. How I am going to extract only the third column from each file while keeping all columns from the first file?

I moved the first file into other folder to exclude from the extraction. The following command did not give the result.

cut -d " " -f3 *.txt | paste ../breedname.txt - > output.txt

I also had attempts using awk command shown in these questions, but it did not work for my dataset.

Any help is welcomed!


Solution

  • Here is a very quick and dirty way of doing it:

    Assuming your files are in the same order:

    $ awk '(FNR==NR){a[FNR]=$0;next}
           {a[FNR]=a[FNR] FS $NF}
           END{for(i=1;i<=FNR;++i) print a[i]}' file1 file2 file3 ... filen
    

    if you want the header a bit cleaner:

    $ awk '(FNR==NR){a[FNR]=$0 (FNR==1?++c:"");next}
           {a[FNR]=a[FNR] FS $NF (FNR==1?++c:"")}
           END{for(i=1;i<=FNR;++i) print a[i]}' file1 file2 file3 ... filen
    

    Assuming your files are not in the same order:

    $ awk '{key=$1 FS $2}
           (FNR==NR){a[key]=$0 (FNR==1?++c:"");next}
           {a[key]=a[key] FS $NF (FNR==1?++c:"")}
           END{for(i in a) print a[i]}' file1 file2 file3 ... filen