Search code examples
bashshellwhile-loopreadfiletext-parsing

I want to read a file line by line and store just some values


I have a file with in which the following content repeats n times

>QDN;6135785008
-------------------------------------------------------------------------------
DN:;;;;;5785008;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
TYPE:;SINGLE;PARTY;LINE
SNPA:;613;;;SIG:;DT;;;;LNATTIDX:;N/A;;;;;;;;;;;;;
LINE;EQUIPMENT;NUMBER:;;;;;BSAC;;39;0;00;01;;;
LINE;CLASS;CODE:;;IBN;;;
IBN;TYPE:;STATION
CUSTGRP:;;;;;;;;BSA_POS;;;;;SUBGRP:;0;;NCOS:;1
CARDCODE:;;V5LOOP;;;;GND:;N;;PADGRP:;NPDGP;;BNV:;NL;MNO:;N
PM;NODE;NUMBER;;;;;:;;;;80
PM;TERMINAL;NUMBER;:;;;;2
OPTIONS:
CWT;DGT;DDN;NOAMA;
;
-------------------------------------------------------------------------------
>QDN;6160160260
-------------------------------------------------------------------------------
DN:;;;;;0160260;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
TYPE:;SINGLE;PARTY;LINE
SNPA:;616;;;SIG:;DT;;;;LNATTIDX:;N/A;;;;;;;;;;;;;
LINE;EQUIPMENT;NUMBER:;;;;;BSAC;;39;0;00;03;;;
LINE;CLASS;CODE:;;IBN;;;
IBN;TYPE:;STATION
CUSTGRP:;;;;;;;;BSA_POS;;;;;SUBGRP:;0;;NCOS:;15
CARDCODE:;;V5LOOP;;;;GND:;N;;PADGRP:;NPDGP;;BNV:;NL;MNO:;N
PM;NODE;NUMBER;;;;;:;;;;80
PM;TERMINAL;NUMBER;:;;;;4
OPTIONS:
CWT;3WC;DGT;DDN;NOAMA;
;
----

I want to read all lines and store some values into 4 variables. Eg; var number (second column of the lines stating by ">QDN"), var type (lines starting with PARTY), var snpa and var options (that one stores the value of the next line after the occurrence of OPTIONS). The output could be a text file separated by semicolon (eg: var1;var2;var3;var4). This is partially working. I have the following code but I couldn't get all those variables together. I tried creating another while loop inside the first one to validate the 'last' check of my loop (the semicolon that separates the blocks of info), but it also did not work.

while IFS= read -r line || [[ -n "$line" ]]; read -r secondline; do
if [[ "$line" =~ ^'>QDN' ]]; then
    number=$(echo "$line" | awk -F ';' 'NF {print $2;}')                
elif [[ "$line" =~ ^'TYPE' ]]; then
    type=$(echo "$line" | awk -F ';' 'NF {print $2" "$3" "$4;}')    
elif [[ "$line" =~ ^'SNPA' ]]; then
    snpa=$(echo "$line" | awk -F ';' 'NF {print $2;}')  
elif [[ "$line" =~ ^'OPTIONS' ]]; then
    options=$(echo "${secondline}") 
fi  
echo $number";"$type";"$snpa";"$options         
done < "file.txt

The output of the code above is someway confused:

;613;CWT;3WC;DGT;DDN;NOAMA;SACB;ACT;I976;$;$;N;
;613;CWT;3WC;DGT;DDN;NOAMA;SACB;ACT;I976;$;$;N;
;613;CWT;DGT;DDN;NOAMA;
;613;CWT;DGT;DDN;NOAMA;
;613;CWT;DGT;DDN;NOAMA;
;613;CWT;DGT;DDN;NOAMA;
;616;CWT;DGT;DDN;NOAMA;
;616;CWT;DGT;DDN;NOAMA;
;616;CWT;DGT;DDN;NOAMA;
;616;CWT;DGT;DDN;NOAMA;
;616;DGT;ARTY LINE
;616;DGT;ARTY LINE
;616;DGT;ARTY LINE    

Could anyone of you help?


Solution

  • Repeated similar small snippets of Awk are often a sign that you should rewrite the whole script in Awk instead.

    The following assumes that OPTIONS always comes after the other fields. It's not hard to remove this restriction but with that, the code is extraordinarily simple.

    awk -F ';' 'BEGIN { OFS=";" }
       /^>QDN/ { number = $2 }
       /^TYPE/ { type = $2 " " $3 " " $4 }
       /^SNPA/ { snpa = $2 }
       /^OPTIONS/ { options = 1; next }
       options { print number, type, snpa, $0;
          number = type = snpa = options = "" }' file.txt
    

    You should probably remove the DOS carriage returns from your file separately, but it's easy to add NF { sub(/\r/, "") } at the top if you need to cope with broken files, too.

    Demo: https://ideone.com/zP102J