Search code examples
sed

Filter a line through an external program


I have a source code management system where I write annotated source and use sed to convert that into pure source, markdown documentation, and test cases.

I would like to have an annotation that allowed me to write

other text...
PR(
eval (internal);e expr env env -> val
PR)
other text...

and end up having the string inside PR tags converted into a table:

other text...
<table>
  <thead>
    <tr>
      <th colspan="2">eval (internal)</th>
    </tr>
  </thead>
  <tr>
    <td>e</td>
    <td>a Lisp expression</td>
  </tr>
  <tr>
    <td>env</td>
    <td>an Environment</td>
  </tr>
  <tr>
    <td><i>Returns:</i></td>
    <td>val</td>
  </tr>
</table>
other text...

editor's note (@Fravadona): the indentation doesn't matter in the expected output.

The basic algorithm is to take the text before the ; to be the header, and the rest of the line is looked at two tokens at a time. If the first token is a name, it is put inside td as is. If it is "->", the "Returns:" text goes in the td. The second token is a key into a dictionary that goes something like this:

env   -> an Environment
val   -> a Lisp value
vals  -> some Lisp values
lvals -> a Lisp list of Lisp values
num   -> a number
nums  -> some numbers
...

Accessing the dictionary is done by keeping a key/value pair of C strings and traversing them with strcmp()..

I may have reached the end of my sed skills here, I don't even know if it is possible. I have written the conversion program myself in C, but don't know how to plug it in with sed.

I'm experimenting with the e command of sed. This works:

cat constcl.md | sed 's/\(eval (.*);.*\)/printf "%s" "$(echo "\1" | tr e i)"/e' |less

But if I try to simplify the regex or substitute my own command, it all goes bonkers.


Solution

  • I have to say, sed isn't ideal for this task. An Awk/Python/Perl/etc solution is probably required.

    Let's assume that your dictionary is stored in a dict.txt file with this format:

    env   -> an Environment
    val   -> a Lisp value
    vals  -> some Lisp values
    lvals -> a Lisp list of Lisp values
    num   -> a number
    nums  -> some numbers
    expr  -> an Expression
    

    And that your "template" in the following template.txt file:

    other text...
    PR(
    eval (internal);e expr env env -> val
    PR)
    other text...
    

    Then here's how you could expand the PR blocks using Awk.
    The main idea is to load the key/values from dict.txt first, and then process template.txt to generate the HTML tables. But don't forget to escape your strings for HTML-text!!! I added a function for it.

    awk '
        # remove the potential CR characters in the input line
        { gsub(/\r/, ""); }
    
        # load the key/values pairs from dict.txt
        # NOTE: NR is equal to FNR only while processing the first file
        NR == FNR {
            if (match($0, /[[:space:]]*->[[:space:]]*/))
                dict[substr($0, 1, RSTART-1)] = substr($0, RSTART+RLENGTH);
            next;
        }
    
        # expand the PR blocks as HTML tables in the remainder file(s)
        $1 == "PR(" { inside_pr_block = 1; next; }
        $1 == "PR)" { inside_pr_block = 0; next; }
        inside_pr_block {
            if (match($0, /;/)) {
                printf "<table>";
                th = substr($0, 1, RSTART-1);
                printf "<thead><tr colspan=2><th>%s</th></tr></thead>", \
                    html_textify(th);
                $0 = substr($0, RSTART+RLENGTH);
                for (i = 1; i <= NF; i += 2) {
                    td1 = ($i == "->" ? "Returns:" : $i);
                    td2 = dict[$(i+1)];
                    printf "<tr><td>%s</td><td>%s</td></tr>", \
                        html_textify(td1), html_textify(td2);
                }
                print "</table>";
            }
            next;
        }
    
        # output non PR lines
        { print; }
    
        # minimalist function that encodes a string as HTML text
        function html_textify(str) {
            gsub(/&/, "\\&amp;", str);
            gsub(/</, "\\&lt;", str);
            gsub(/>/, "\\&gt;", str);
            return str;
        }
    ' dict.txt template.txt
    

    With the given input files, Awk outputs (the indentation is added by me):

    other text...
    <table>
      <thead>
        <tr colspan=2>
          <th>eval (internal)</th>
        </tr>
      </thead>
      <tr>
        <td>e</td>
        <td>an Expression</td>
      </tr>
      <tr>
        <td>env</td>
        <td>an Environment</td>
      </tr>
      <tr>
        <td>Returns:</td>
        <td>a Lisp value</td>
      </tr>
    </table>
    other text...