Search code examples
bashdata-cleaning

Joining multiple files iteratively in Bash


I have files in the following format:

in file0.txt

234 ABC
123 KJN
245 MDJ
274 FWX

in file1.txt

45 ABC
1 KJN
546 MDJ
456 FWX

similarly, I have 24 files in total from file0.txt to file23.txt I am trying to join all these files into one file based on the 2nd column as the key. Desired output is:

ABC 234 45
KJN 123 1
MDJ 245 546
FWX 274 456

I wrote something like this:

for(i=0;i<=23;i++)
 do
     if[ -e file$i ]
       then
             join -1 2 -2 2 file$i.txt file`expr $i +1`.txt > fileJoined.txt
             fileJoined.txt > file`expr $i +1`.txt
     fi
 done

The logic fails after the first run as the joined file structure is changed and the join tries to join it on second column . Please suggest how to correct this. Also is there any better way to achieve this? Please note I had carefully used a space in if [ -e file$i ]


Solution

  • Using awk:

    $ awk '{a[$2]=a[$2] (a[$2]==""?"":OFS) $1}END{for(i in a)print i,a[i]}' file1 file2
    

    Output:

    ABC 234 45
    FWX 274 456
    MDJ 245 546
    KJN 123 1
    

    The script stores records to memory so you should have enough memory to handle all required files. Explained:

    $ awk '{
        a[$2]=a[$2] (a[$2]==""?"":OFS) $1  # hash all records to a, index with $2
    }
    END {                                  # after hashing all records
        for(i in a)                        # go thru all of them in random order
            print i,a[i]                   # and output them
    }' file1 file2