Search code examples
linuxbashshellcommand-linelog-analysis

Is there a linux command that can cut and pick columns that match string patterns?


I need to analyze logs and my end user has to be able to see them in a formatted way, as mentioned below, and my nature of logs is the key variables will be in different position, rather than at fixed columns based on the application, as these log formats are from various applications.

"thread":"t1","key1":"value1","key2":"value2",......"key15":"value15"

I have a way to split and cut this to analyze only particular keys, using the following,

cat file.txt | grep 'value1' | cut -d',' -f2,7,8-

This is the command I am able to get, the requirement is I need to grep all logs which have 'key1' as 'value1', and this value1 will be most likely unique among all, so I am using a grep directly, if required, I can use grep to pick along with the key and value string, but main problem I am facing, is the part is after cut. I want to pick only key2, key7, key8 among these lines, but key2, key7, key8 might not appear in the same column numbers like in this order, key2 might even be at column 3 or 4 or after key7/key8, so I want pick based on the key value and get exactly

"key2":"value2", "key7":"value7", "key8:value8"

The end user is not particularly picky about the order in which they appear, they need only these keys from each line to be displayed.. Can someone help me? I tried piping with awk / grep again, but they still match the entire line not on the columns alone

My input is


{"@timestamp":"2021-08-05T06:38:48.084Z","level":"INFO","thread":"main","logger":"className1","message":"Message 1"} {"@timestamp":"2021-08-05T06:38:48.092Z","level":"DEBUG","thread":"main","logger":"className2","message":"Message 2"} {"@timestamp":"2021-08-05T06:38:48.092Z","level":"DEBUG","thread":"thead1","logger":"className2","message":"Message 2"}


I basically want my output to be more like, find only the "thread":"main" lines and print only the key and values of "logger" and "message" for each line which matched, since the other key and value are irrelevant to me. there is more than 15 to 16 keys in my file and my key positions could be swapped, like "message" could be the first to appear and "logger" could be the second to appear in some log files. Of course, the keys are just an example, the real keys I am trying to find are not "logger" and "message" alone.

There are log analysis tools, but this is a pretty old system, and the logs are not real time ones I am analyzing and displaying some files which are pretty much older than years.


Solution

  • Not sure I really understand your specification but the following awk script could be a starting point:

    $ cat foo.awk
    BEGIN {
      k["\"key1\""] = 1; k["\"key7\""] = 1; k["\"key8\""] = 1;
    }
    /"key1":"value1"/ {
      s = "";
      for(i = 1; i <= NF; i+=2)
        if($i in k)
          s = s (s ? "," : "") $i ":" $(i+1);
      print s;
    }
    $ awk -F',|:' -f foo.awk foo.txt
    "key1":"value1","key7":"value7","key8":"value8"
    

    Explanation:

    • awk is called with the -F',|:' option such that the fields separator in each record is the comma or the colon.
    • In the BEGIN section we declare an associative array (k) of the selected keys, including the surrounding double quotes.
    • The rest of the awk script applies to each record containing "key1":"value1".
      • Variable s is used to prepare the output string; it is initialized to "".
      • For each odd field (the keys) we check if it is in k. If it is, we concatenate to s:
        • a comma if s is not empty,
        • the key field,
        • a colon,
        • the following even field (the value).
      • We print s.