Search code examples
shellunixawksedgrep

find line that contains string & echo the value to a new line using shell script


i want to find the solution in bash script: i have the raw output logs. Each line begins with a date, e.g. Apr 10 11:17:35

I want to Loop through each log item, and find the lines that contain the string coderbyte heroku/router. For each of those, echo the request_id value to a new line, and if the fwd key has the value of MASKED, then add a [M] to the end of the line with a space before it

output log

Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://coderbyte.com" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?key=s2fwad2Es2" host=coderbyte.com request_id=b19a87a1-1bbb-4e67-b207-bd9f23d46afa fwd="108.31.000.000" dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65 fwd="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2 fwd="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https

Apr 10 11:17:35 coderbyte app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?key=s2fwad2Es2 HTTP/1.1" 200 4263 "https://coderbyte.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=4eiramcmayu0" host=coderbyte.com request_id=d48278c2-5731-464e-be38-ab9ad84ac4a8 fwd="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https

Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q HTTP/1.1" 200 4263 "https://coderbyte.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36

Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q HTTP/1.1" 200 4263 "https://coderbyte.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36

Apr 10 11:17:36 coderbyte app/web.4: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=4eiramcmayu0 HTTP/1.1" 200 3023 "https://coderbyte.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36

Apr 10 11:17:36 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=8bb2413c-3c67-4180-8091-000313b8d9ca fwd="MASKED" dyno=web.3 connect=1ms service=32ms status=200 bytes=4435 protocol=https

Apr 10 11:17:36 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=10f93da3-2753-48a3-9485-857a93d8a88a fwd="MASKED" dyno=web.3 connect=1ms service=37ms status=200 bytes=4435 protocol=https

here is my code

#!/bin/bash
curl -s https://coderbyte.com/api/challenges/logs/web-logs-raw -O > /dev/null
#cat web-logs-raw

grep -F "coderbyte heroku/router" web-logs-raw >> test
cat test

This so far can filter the logs and find the lines that contain the string coderbyte heroku/router. but how do i echo the request_id value to a new line, and if the fwd key has the value of MASKED, then add a [M] to the end of the line with a space before it.

Output should be like this

b19a87a1-1bbb-000-00000
b19a87a1-1bbb-000-11111
8bb2413c-3c67-4180-22222 [M]
10f93da3-2753-48a3-33333 [M]

Solution

  • One liner:

     awk '/coderbyte heroku\/router/ { split($10,map,"=");id=map[2];split($11,map1,"\"");print map1[2]=="MASKED"?id" [M]":id }' web-logs-raw
    

    Explanation:

    awk '/coderbyte heroku\/router/ { # Search for lines with required text
              split($10,map,"="); # Split the 10th space delimited field into the array map using "=" as the field separator
              id=map[2]; # Set the variable id to the the second index of the map array
              split($11,map1,"\""); # Split the 11th field into the array map1 using " as the field separator (this is the masked variable)
              print map1[2]=="MASKED"?id" [M]":id # If the masked entry is MASKED, print "[M]" and then the id otherwise just print the id
         }' web-logs-raw  
    

    Output:

    b19a87a1-1bbb-4e67-b207-bd9f23d46afa
    910b07d1-3f71-4347-a1a7-bfa20384ef65
    097bf65e-e189-4f9f-9dfb-4758cff411b2
    d48278c2-5731-464e-be38-ab9ad84ac4a8
    8bb2413c-3c67-4180-8091-000313b8d9ca [M]
    10f93da3-2753-48a3-9485-857a93d8a88a [M]