I´m currently working on a short SED script that needs to HTML-encode parts of an XML-file. The script currently looks like this:
sed.exe "/<messageData>/,/<\/messageData>/ {/<messageData>/b;/<\/messageData>/b; s/</\</g; s/>/\>/g; }" %1 >%2
So basically, replace all < and > with < and > , between the and tags.
This script works perfectly well with pretty printed XML, that is
<?xml version="1.0" encoding="ISO-8859-1"?>
<Messages>
<messageData>
<test>DATA</test>
</messageData>
</Messages>
comes out as
<?xml version="1.0" encoding="ISO-8859-1"?>
<Messages>
<messageData>
<test>DATA</test>
</messageData>
</Messages>
which is what I need. My issue is that the files I need process is not pretty printed, everything is on a single line, like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<Messages><messageData><test>DATA</test></messageData></Messages>
And with this format, the script no longer works. Would it be possibly to modify my script to work with both formats?
Please note that I´m not able to affect the output format, and that SED is the scripting engine to be used.
I guess I could just create another SED script that would insert a line-break after each > in the file, and after that run the script that I have created now. However I´m guessing that would not be very efficient performance wise (two passes over each file).
Any suggestions?
Regards Daniel
In case someone happens to stumble on the same issue, this is how we solved it. I know it´s not pretty, be it will have to do until we can use a better solution.
sed.exe -i "s/\(>\)\(<\)/\1\n\2/g" %1
sed.exe "/<messageData>/,/<\/messageData>/ {/<messageData>/b;/<\/messageData>/b; s/</\</g; s/>/\>/g; }" %1 >%2