Search code examples
bashshellawkscripting

Efficient way to read key=value pair from a bash file for each line to process it


I need suggestion for a better programming in bash for processing file ( specially the key value pairs in a line)

I was trying to process a task for log lines :

  1. if the word "critical/warning" appears, I should print the value of request_id in a new line
  2. if the key IPA has value "MASKED" then append " MASK" with the request_id in output

I wrote below code to process it

while read line
do
  if [ $( echo "$line" | grep "critical/warning" | grep -c "request_id=") -gt 0 ]
  then
    request_id=$( echo "$line"| awk -F"request_id=" '{print $2}'| awk '{print $1}')
    if [ $(echo "$line" | grep -c "IPA=") -gt 0  ]
    then
      IPA=$(echo "$line"| awk -F"IPA=" '{print $2}'| awk '{print $1}');
      [[ "M$IPA" == "M\"MASKED\"" ]] && request_id="$request_id MASK"
    fi
    echo $request_id; 
  fi
done < test.txt

Below is the sample log file

Apr 10 11:17:35 jalaltu app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://jalaltu.com" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0
Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?key=s2fwad2Es2" host=jalaltu.com request_id=b19a87a1-1bbb-4e67-b207-bd9f23d46afa IPA="108.31.000.000" dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https

Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65 IPA="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https

Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2 IPA="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https

Apr 10 11:17:35 jalaltu app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?key=s2fwad2Es2 HTTP/1.1" 200 4263 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36

Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=4eiramcmayu0" host=jalaltu.com request_id=d48278c2-5731-464e-be38-ab9ad84ac4a8 IPA="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https

Apr 10 11:17:35 jalaltu app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q HTTP/1.1" 200 4263 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36

Apr 10 11:17:35 jalaltu app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q HTTP/1.1" 200 4263 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36

Apr 10 11:17:36 jalaltu app/web.4: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=4eiramcmayu0 HTTP/1.1" 200 3023 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36

Apr 10 11:17:36 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=8bb2413c-3c67-4180-8091-000313b8d9ca IPA="MASKED" dyno=web.3 connect=1ms service=32ms status=200 bytes=4435 protocol=https

Apr 10 11:17:36 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=10f93da3-2753-48a3-9485-857a93d8a88a IPA="MASKED" dyno=web.3 connect=1ms service=37ms status=200 bytes=4435 protocol=https

Below is the output from sample log file

b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca MASK
10f93da3-2753-48a3-9485-857a93d8a88a MASK

Solution

  • Assumptions:

    • while the sample data shows request_id always coming before IPA, I'm going to assume this may not always be the case

    One idea using a single awk invocation (which should be a bit faster than the current bash looping construct with several sub-process calls to echo/grep/awk):

    awk '
    /critical[/]warning/ &&                                           # if line contains "critical/warning" and ...
    /request_id/ { mask=""                                            # line contains "request_id", clear the "mask" variable
                   for (i=1 ; i<=NF; i++)                             # loop through our input fields 
                       { split($(i),arr,"=")                          # split current field on "=", store results in array "arr[]"
                         if ( arr[1] == "request_id" )                # if field is "request_id" ...
                            { reqid = arr[2] }                        # save the associated id
                         if ( arr[1] == "IPA" && arr[2] ~ "MASKED" )  # if field is "IPA" and value matches "MASKED" ...
                            { mask = " MASK"  }                       # set our "mask" variable
                       }
                    print reqid mask                                  # print our variables
                 }
    ' log.dat
    

    NOTE: Remove comments to declutter code

    The above generates:

    b19a87a1-1bbb-4e67-b207-bd9f23d46afa
    910b07d1-3f71-4347-a1a7-bfa20384ef65
    097bf65e-e189-4f9f-9dfb-4758cff411b2
    d48278c2-5731-464e-be38-ab9ad84ac4a8
    8bb2413c-3c67-4180-8091-000313b8d9ca MASK
    10f93da3-2753-48a3-9485-857a93d8a88a MASK