Search code examples
regexshellawkreadfiletext-processing

I want to read a file and store some variables with AWK


I have a file with the following content. it is the result of a query in an equipment, so it is expected that some input are not found in the database. The following examples are the result of successful and unsuccessful queries. I mean that the second example does not have all the information that I want to capture into the variables, so i want to ignore this result and set the variables with null/empty values.

<INTLPO:ISV=PORTAB NTL="6130290095" VEM=NAO;
VECTURA - SS            BSA002             2020-09-12            09-32
INTLPO:ISV=PORTAB NTL="6130290095" VEM=NAO;
INTERROGACAO DE NUMERO TELEFONICO PARA PORTABILIDADE NUMERICA                   

  TIPO DE ENCAMINHAMENTO POR ASSINANTE
  NTL = 6130290095           OPC = S_INF    RNP = 551      CSP = 25
  EIP = S_INF
  CDO = 00961
  CNL = 61000                NUF = S_INF                   TPB = PREST
  CPT = NAO                  CRE = 125      NUE = S_INF
  DAT = 2014-04-16           HOR = 10:30:20.798609
  TBR = 25
  RST              MAN      RST              MAN      RST              MAN
  2%               934      3%               934      4%               934
  5%               934      6%               934      7%               934
  8%               934      9%               934      9090%            934
  0??%             934      90??%            934      0?0%             934


  TOTAL DE NUMEROS ASSOCIADOS AO SERVICO: 1
<INTLPO:ISV=PORTAB NTL="6160150178" VEM=NAO;
VECTURA - SS            BSA002             2020-09-12            09-32
INTLPO:ISV=PORTAB NTL="6160150178" VEM=NAO;
INTERROGACAO DE NUMERO TELEFONICO PARA PORTABILIDADE NUMERICA                   

  ME:  NENHUM NUMERO CADASTRADO ATENDE AS ESPECIFICACOES

I have the following code which is partially ok. The result is a little mess up yet (the lines are repeating and even with wrong values).

awk -F ' ' 'BEGIN { OFS="," }
            /^VECTURA/ { equipment = $4; data = $5 }
            /^INTLPO/ { numero = $2}
            /^\s*NTL/ { ntl = $3 ; opc = $6; rnp = $9; csp = $12}
            /^\s*EIP/ { eip = $3}
            /^\s*CDO/ { cdo = $3}
            /^\s*CNL/ { cnl = $3; nuf = $6; tpb = $9}
            /^\s*CPT/ { cpt = $3; cre = $6; nue = $9}
            /^\s*DAT/ { dat = $3; hor = $6}
            /^\s*TBR/ { tbr = $3}
            /^\s*RST/ { man = $2; next}
            { print data, equipment, numero, ntl, opc, rnp, csp, eip, cdo, cnl, nuf, tpb, cpt, cre, nue, dat, hor, tbr, man}' input.tx >> output.txt

result

2020-09-12,BSA002,6160150536,,,,,,,,,,,,,,,,
2020-09-12,BSA002,6160150536,,,,,,,,,,,,,,,,
2020-09-12,BSA002,6130290095,,,,,,,,,,,,,,,,
2020-09-12,BSA002,6130290095,,,,,,,,,,,,,,,,
2020-09-12,BSA002,6130290095,,,,,,,,,,,,,,,,
2020-09-12,BSA002,6130290095,,,,,,,,,,,,,,,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,,,,,,,,,,,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,,,,,,,,,,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,,,,,,,,,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,,,,,,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,,,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN

note that the record 6130290095 (variable NTL) is wrongly associated with the "number" record (last lines of the example above).

How could I overcome that? I have tried some AWK conditional statement but wasn't successful either. As a output i would like just one line by record, as some lines of the output example can exemplify. thanks a lot.


Solution

  • When you only want to change the value of numero when it is not set, add a test like numero ||.
    After reading your comment I changed my solution. As I understand now, you don't want one record with the results fo all blocks combined, but you want one resulting line for each block processed. Each new block starts with <INTLPO.
    This solution will make all values empty at the start of a new block (not needed for the first block but it won't harm).
    The results of a block are shown, when a new block is found and when we are at the end of the file.

    awk 'function newrecord() {
            recordnumber++;
            data=equipment=numero=ntl=opc=rnp=csp=eip=cdo="";
            cnl=nuf=tpb=cpt=cre=nue=dat=hor=tbr=man="";
         }
         function printrecord() {
             print data, equipment, numero, ntl, opc, rnp, csp, eip,
                   cdo, cnl, nuf, tpb, cpt, cre, nue, dat, hor, tbr, man;
         }
    
         BEGIN { OFS="," }
                /^<INTLPO/ { if (recordnumber) printrecord(); newrecord(); }
                /^VECTURA/ { equipment = $4; data = $5 }
                /^INTLPO/ { numero = $2}
                /^\s*NTL/ { ntl = $3 ; opc = $6; rnp = $9; csp = $12}
                /^\s*EIP/ { eip = $3}
                /^\s*CDO/ { cdo = $3}
                /^\s*CNL/ { cnl = $3; nuf = $6; tpb = $9}
                /^\s*CPT/ { cpt = $3; cre = $6; nue = $9}
                /^\s*DAT/ { dat = $3; hor = $6}
                /^\s*TBR/ { tbr = $3}
                /^\s*RST/ { man = $2; next}
                END { printrecord(); }
          ' input.tx