Using awk to process html-related Gift-format Moodle questions

This is basically a awk question but it is about processing data for the Moodle Gift format, thus the tags.

I want to format html code in a question (Moodle "test" activity) but I need to replace < and > with the corresponding entities, as these will be interpreted as "real" html, and not printed. However, I want to be able to type the question with regular code and post-process the file before importing it as gift into Moodle.

I thought awk would be the perfect tool to do this.

Say I have this (invalid as such) Moodle/gift question:

::q1::[html]This is a question about HTML:
<pre>
<p>some text</p>
</pre>
and some tag:<code><img></code>
{T}

What I want is a script that translates this into a valid gift question:

::q1::[html]This is a question about HTML:
<pre>
&lt;p&gt;some text&lt;/p&gt;
</pre>
and some tag:<code>&lt;img&gt;</code>
{T}

key point: replace < and > with < and > when:

inside a <pre>-</pre> bloc (assuming those tags are alone on a line)
between <code>and </code>, with arbitrary string in between.

For the first part, I'm fine. I have a shell script calling awk (gawk, actually).

awk -f process_src2gift.awk $1.src >$1.gift

with process_src2gift.awk:

BEGIN { print "// THIS IS A GENERATED FILE !" }
{
    if( $1=="<pre>" ) # opening a "code" block
    {
        code=1;
        print $0;
    }
    else
    {
        if( $1=="</pre>" ) # closing a "code" block
        {
            code=0;
            print $0;
        }
        else
        { # if "code block", replace < > by html entities
            if( code==1 )
            {
                gsub(">","\\&gt;");
                gsub("<","\\&lt;");
            }
            print $0;
        }
    }
}
END { print "// END" }

However, I'm stuck with the second requirement..

Questions:

Is it possible to add to my awk script code to process the hmtl code inside the <code> tags? Any idea ? I thought about using sed but I didn't see how to do that.
Maybe awk isn't the right tool for that ? I'm open for any suggestion on other (standard Linux) tool.

Solution

Answering own question.

I found a solution by doing a two step awk process:

first step as described in question
second step by defining <code> or </code> as field delimiter, using a regex, and process the string replacement on second argument ($2).

The shell file becomes:

echo "Step 1"
awk -f process_src2gift.awk $1.src >$1.tmp

echo "Step 2"
awk -f process_src2gift_2.awk $1.tmp >$1.gift

rm $1.tmp

And the second awk file (process_src2gift_2.awk) will be:

BEGIN { FS="[<][/]?[c][o][d][e][>]"; }
{
    gsub(">","\\&gt;",$2);
    gsub("<","\\&lt;",$2);
    if( NF >= 3 )
        print $1 "<code>" $2 "</code>" $3
    else
        print $0
}

Of course, there are limitations:

no attributes in the <code> tag
only one pair <code></code> in the line
probably others...