Search code examples
unixawkdelimitercut

Unix - Need to cut a file which has multiple blanks as delimiter - awk or cut?


I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example:

2U2133   1239  
1290fsdsf   3234

From this, I need to extract

1239  
3234

The delimiter for all records will be always 3 blanks.

I need to do this in an unix script(.scr) and write the output to another file or use it as an input to a do-while loop. I tried the below:

while read readline  
do  
        read_int=`echo "$readline"`  
        cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`  
if [ $cnt_exc -gt 0 ]  
then  
  int_1=0  
else  
  int_2=0  
fi  
done < awk -F'  ' '{ print $2 }' ${Directoty path}/test_file.txt  

test_file.txt is the input file and file1.txt is a lookup file. But the above way is not working and giving me syntax errors near awk -F

I tried writing the output to a file. The following worked in command line:

more test_file.txt | awk -F'   ' '{ print $2 }' > output.txt

This is working and writing the records to output.txt in command line. But the same command does not work in the unix script (It is a .scr file)

Please let me know where I am going wrong and how I can resolve this.

Thanks,
Visakh


Solution

  • It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:

    cut -i -d' ' -f 2 data.file
    

    If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.

    You need to pipe the output of awk into your loop, though:

    awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
    while read readline  
    do  
        read_int=`echo "$readline"`  
        cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
        if [ $cnt_exc -gt 0 ]  
        then int_1=0  
        else int_2=0
        fi  
    done
    

    The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.

    With bash, you can use process substitution:

    while read readline  
    do  
        read_int=`echo "$readline"`  
        cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
        if [ $cnt_exc -gt 0 ]  
        then int_1=0  
        else int_2=0
        fi  
    done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)
    

    This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.

    The blank in ${Directory path} is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty) in one place.