Search code examples
perlxml-libxml

How to prevent XML::LibXML to save modified xml using self-closing tag


The following working code reads my XML file containing lots of empty elements, then applies 2 changes and saves it again under different name. But it also changes empty elements like <element></element> to self-closing tags like <element /> which is unwanted.
How to save it not using self-closing tags? Or by another words how to tell XML::LibXML to use empty tags? The original file is produced in commercial application, which uses style with empty elements, so I want to sustain that.

#! /usr/bin/perl

use strict;
use warnings;
use XML::LibXML;

my $filename = 'out.xml';
my $dom = XML::LibXML->load_xml(location => $filename);
my $query = '//scalar[contains(@name, "partitionsNo")]/value';
for my $i ($dom->findnodes($query)) {
$i->removeChildNodes();
$i->appendText('16');
}

open my $out, '>', 'out2.xml';
binmode $out;
$dom->toFH($out);
# now out2.xml has only self-closing tags where previously 
# were used empty elements

Solution

  • Unfortunately, XML::LibXML doesn't support libxml2's xmlsave module which has a flag to save without empty tags.

    As a workaround you can add an empty text node to empty elements:

    for my $node ($doc->findnodes('//*[not(node())]')) {
        # Note that appendText doesn't work.
        $node->appendChild($doc->createTextNode(''));
    }
    

    This is a bit costly for large documents, but I'm not aware of a better solution.

    That said, the fragments <foo></foo> and <foo/> are both well-formed and semantically equivalent. Any XML parser or application that treats such fragments differently is buggy.


    Note that some people believe the XML spec recommends using self-closing tags, but that's not exactly true. The XML spec says:

    Empty-element tags may be used for any element which has no content, whether or not it is declared using the keyword EMPTY. For interoperability, the empty-element tag should be used, and should only be used, for elements which are declared EMPTY.

    This means elements that are declared EMPTY in a DTD. For other elements, or if no DTD is present, the XML standard advises not to use self-closing tags ("and should only be used"). But this is only a non-binding recommendation for interoperability.