Search code examples
unixawkgrepls

UNIX: Extracting only the information I need


I have the following content on a file and I need to extract certain things to another file to make the analysis easier.


saimptlogi_1~20170208022514~procRTLFHead~~103~RET-0103: generic function processing error~DATAUNEXPECTEDSTOREDAY on FHEAD record at line 0000000001 in /oretail/apprms/mmhome/data/in/RTLOG_4403_20170115010230_1.dat
saimptlogi_1~20170208022549~procRTLFHead~~103~RET-0103: generic function processing error~DATAUNEXPECTEDSTOREDAY on FHEAD record at line 0000000001 in /oretail/apprms/mmhome/data/in/RTLOG_4189_20170122010240_1.dat
saimptlogi_1~20170208022555~procRTLFHead~~103~RET-0103: generic function processing error~DATAUNEXPECTEDSTOREDAY on FHEAD record at line 0000000001 in /oretail/apprms/mmhome/data/in/RTLOG_4403_20170116010200_1.dat
saimptlogi_1~20170208022556~procRTLFHead~~103~RET-0103: generic function processing error~DATAUNEXPECTEDSTOREDAY on FHEAD record at line 0000000001 in /oretail/apprms/mmhome/data/in/RTLOG_4189_20170108010210_1.dat
saimptlogi_1~20170208022610~procRTLFHead~~103~RET-0103: generic function processing error~DATAUNEXPECTEDSTOREDAY on FHEAD record at line 0000000001 in /oretail/apprms/mmhome/data/in/RTLOG_4147_20170101010223_1.dat
saimptlogi_1~20170208022643~procRTLFHead~~103~RET-0103: generic function processing error~DATAUNEXPECTEDSTOREDAY on FHEAD record at line 0000000001 in /oretail/apprms/mmhome/data/in/RTLOG_4189_20170107010206_1.dat
saimptlogi_1~20170208022703~procRTLFHead~~103~RET-0103: generic function processing error~STOREDAYNOTREADYTOBELOAD on FHEAD record at line 0000000001 in /oretail/apprms/mmhome/data/in/RTLOG_4549_20170126010247_7.dat
saimptlogi_1~20170208022707~procRTLFHead~~103~RET-0103: generic function processing error~DATAUNEXPECTEDSTOREDAY on FHEAD record at line 0000000001 in /oretail/apprms/mmhome/data/in/RTLOG_4189_20170114010259_1.dat
saimptlogi_1~20170208022736~procRTLFHead~~103~RET-0103: generic function processing error~DATAUNEXPECTEDSTOREDAY on FHEAD record at line 0000000001 in /oretail/apprms/mmhome/data/in/RTLOG_4403_20170108010211_1.dat

I want to extract the error (DATAUNEXPECTEDSTOREDAY or STOREDAYNOTREADYTOBELOAD) the Store (RTLOG_4403_20170108010211_1) and the Date (RTLOG_4403_20170108010211_1) to another file and I need the output to be like this:

Example:

  • DATAUNEXPECTEDSTOREDAY 4403 20170108
  • STOREDAYNOTREADYTOBELOAD 4549 20170126

I've already developed a command to extract the STORE and the DATE directly from the files (RTLOGS) but it would be better to extract directly from this log file.

My Command: ls {RTLOG*.failed,RTLOG*.rej} | awk -F'|' '{gsub("_"," "); print substr($0,7,13), $4}'

Thank you in advance.


Solution

  • @Pedro: Try:

    awk '{match($0,/DATAUNEXPECTEDSTOREDAY|STOREDAYNOTREADYTOBELOAD/);if(substr($0,RSTART,RLENGTH)){A=substr($0,RSTART,RLENGTH)};match($0,/RTLOG_.*\.dat/);if(substr($0,RSTART,RLENGTH)){split(substr($0,RSTART,RLENGTH), Q,"_");print A OFS Q[2] OFS substr(Q[3],1,8)}}'  OFS="|"   Input_file
    

    Here I am using match functionality of awk and first match I am looking for strings "DATAUNEXPECTEDSTOREDAY|STOREDAYNOTREADYTOBELOAD" and then checking if substring of RSTART and RLENGTH is present(RSTART and RLENGTH are the variables which will be SET when a match will be found for a regex in a line), if yes then putting variable A's value into substr($0,RSTART,RLENGTH). Then in next match checking for RTLOG_.*dat to get the "RTLOG_4147_20170101010223_1.dat" part from line, if that match is found then using split to split the value of substr($0,RSTART,RLENGTH) into an array named Q whose delimiter is "_". Then printing the values of variable A Q[2] OFS substr(Q[3],1,8) where Q[2] is 2nd element of array Q which is 4403,4189 and so on, then as per OP's request taking only 8 letters from RTLOG_4403_20170108010211_1's highlighted part.

    Adding a non-one liner form of solution too now.

    awk '{
            match($0,/DATAUNEXPECTEDSTOREDAY|STOREDAYNOTREADYTOBELOAD/);
            if(substr($0,RSTART,RLENGTH)){
                                            A=substr($0,RSTART,RLENGTH)
                                         };
            match($0,/RTLOG_.*\.dat/);
            if(substr($0,RSTART,RLENGTH)){
                                            split(substr($0,RSTART,RLENGTH), Q,"_");
                                            print A OFS Q[2] OFS substr(Q[3],1,8)
                                         }
         }
        '  OFS="|"   Input_file