Search code examples
shellawktextzsh

AWK command to process output from a script


I have a shell script (zsh) that outputs this (set of database count results) with some regular text dispersed throughout:

Regular lines of output that should be ignored.
+-----------------------------------------+
| rowcount.account_config records deleted |
+-----------------------------------------+
|                                       2 |
+-----------------------------------------+
+----------------------------------------+
| rowcount.accounts_user records deleted |
+----------------------------------------+
|                                      2 |
+----------------------------------------+

and I want transform it into text of this format:

Regular lines of output that should be ignored.
- account_config records deleted: 2
- accounts_user records deleted: 2

SO far, the closed I have is this:

functionThatOutputsText | awk -F '|' '/rowcount\./ {print "- " $2 ": " $NF}'

Which is not right:

-  rowcount.account_config records deleted : 
-  rowcount.accounts_user records deleted : 

Update Tuesday 28 May 2024, 09:30:24 AM

I took @tripleee's answer and made a minor adjustment to remove rowcount. from the output:

functionThatOutputsText | sed -e 's/rowcount.//' | awk -v OFS=": " '/^\|.*\|$/ {
  if(p) { print "- " p, $2; p="" }
  else { p=substr($0, 3, length($0)-4) }
  next }
/^\+-*\+$/ { next } 1' 

That gave me exactly what I wanted:

Regular lines of output that should be ignored.
- account_config records deleted: 2
- accounts_user records deleted: 2

I do have control of the script so I can also just remove the output of "rowcount." in the first place. I had it there as a marker if I needed to help target the lines of output I wanted to transform.


Looking at @pmf's comments, that can work too. Those counts were coming from mysql with:

SELECT ROW_COUNT() as 'rowcount.account_config records deleted';

SO I changed them to:

SELECT ROW_COUNT() as 'rowcount.account_config records deleted'\G

Which changes the count output to:

*************************** 1. row ***************************
rowcount.account_config records deleted: 2
*************************** 1. row ***************************
rowcount.accounts_user records deleted: 2

And then I adjusted the script to:

functionThatOutputsText | grep -Ev '[*]{3,}' | sed -e 's/rowcount./- /'

And that gives me the output I want too.


Solution

  • As suggested in comments, the database you are querying can almost certainly be configured to output the results in a machine-readable format. Probably explore that instead, or as well.

    Having said that, your attempt has multiple problems.

    • You are doing nothing to print lines outside of the database output.
    • You are doing nothing to print the number field. The value of $NF is from the current line, so you are basically printing the empty string after the final | when you print $NF.

    Here is an alternative attempt.

    awk -v OFS=": " '/^\|.*\|$/ {
      if(p) { print "- " p, $2; p="" }
      else { p=substr($0, 3, length($0)-4) }
      next }
    /^\+-*\+$/ { next } 1'
    

    The variable p is a simple flip-flop; when it is unset, we save the current match (which should be the header line); if we already have a header line, we print the header and the current value. (The value of OFS controls what , in print produces.)

    Other than that, skip lines which consist of only plus signs and dashes, and print any other lines (the final 1).

    Demo: https://ideone.com/EIp81U

    This assumes that there is only one line of output per query. It would not be terribly hard to change it to extract multiple lines per query, but since your example doesn't contain any such outputs, I didn't go there.