Search code examples
xmlperlxml-libxml

How to insert a literal '&' using setAttribute()


I am using XML::LibXML (2.0018; perl 5.16.3) I have a hash that contains a series of attribute that are then applied to an XML document using setAttribute(). This stuff is for updating a tomcat server.xml file that needs to be modified to work with an apache httpd front end, and is executed by the continuous deployment scripts.

The adding of the basic attributes works fine:

use XML::LibXML qw ();

...

my %tmphash = ( port => "8581", address => "127.0.0.1", ... );

...

Then in some method that takes a hash reference:

foreach my $key (keys %$hashConnRef) {
  $connector->setAttribute("$key" => $hasConnRef->{$key});
}

All well and good, up until now where I need to add an attribute that needs a literal & in the output, so that tomcat will properly pick it up.

The attribute to be put into the server.xml file in the should look like (desired result):

relaxedQueryChars="[]|{}^\`"<>" 

However, the setAttribute() call conveniently converts the "&" into "&", resulting in (current output):

relaxedQueryChars="[]|{}^\`"<>"

I've tried escaping (and double-escaping) the entry in the hash, such as:

relaxedQueryChars => "[]|{}^\\\`\"\<\>"

Unfortunately, in the former case it simply put \&#x60, and in the latter it added a \ before the &. How can I define the string in the hash so that it will process through the setAttribute and properly emit the &#x5c?

Per a request, here is a full example:

/tmp/min.xml (essentially everything from a tomcat conf/server.xml stripped):

<?xml version="1.0" encoding="utf-8"?>
<Server port="8385" shutdown="SHUTDOWN">
  <Service name="Catalina">
  </Service>
</Server>

And a minimal example program:

#!/usr/bin/perl -w

use strict;
use warnings;

use XML::LibXML qw ( );

my %tmphash = (
  port => "8381",
  address => "127.0.0.1",
  relaxedQueryChars => "[]|{}^\&#x5c;\&#x60;\&quot;\&lt;\&gt;"
  );

sub edit_server_xml {
  my ($serverFile, $hashConnRef) = @_;

  my $parser = XML::LibXML->new();

  my $doc = $parser->parse_file($serverFile);

  for my $server ($doc->findnodes("/Server")) {
    # delete all of the defined connectors
      for my $service ($server->findnodes("Service")) {
        for my $connector ($service->findnodes("Connector")) {
          $service->removeChild($connector);
        }
      }

      my $connector = $doc->createElement("Connector");
      for my $service ($server->findnodes("Service")) {
        foreach my $key (keys %$hashConnRef) {
          $connector->setAttribute("$key" => $hashConnRef->{$key});
        }

        $service->appendChild($connector);
        $service->appendTextNode("\n");
      }

    $doc->toFile($serverFile);
  }
}

edit_server_xml("/tmp/min.xml", \%tmphash);

Resultant line that is incorrect:

<Connector address="127.0.0.1" relaxedQueryChars="[]|{}^&amp;#x5c;&amp;#x60;&amp;quot;&amp;lt;&amp;gt;" port="8381"/>

Solution

  • I think basically the only change you need is relaxedQueryChars => "[]|{}^\\\"<>" - don't pre-encode stuff, libxml will take care of all necessary entity-encoding:

    #!perl
    use strict;
    use warnings;
    use XML::LibXML;
    
    my $doc = XML::LibXML->load_xml(string=>'<f/>');
    $doc->documentElement->setAttribute('foo' => '[]|{}<>\\&#');
    print $doc->toString
    
    __END__
    
    <?xml version="1.0"?> <f foo="[]|{}&lt;&gt;\&amp;#"/>
    

    Your fear that a backslash "escapes the next character" in XML is not supported by Wikipedia - the ampersand character & is the character that is used to entity-encode all problematic characters.