Search code examples
xmlperlxml-libxml

Are blank child nodes of any use to XML parsers?


Why do we have to have the notion of blank XML nodes? What benefit do they bring to the alchemy of XML parsing?

A simple example here with Perl's XML::LibXML:

use strict;
use warnings;
use feature 'say';
use XML::LibXML;

my $xml = XML::LibXML->load_xml( string => <<'XMLDOC' );
<alphabet>
 <child name='alpha'/>
 <child name='bravo'/>
 <child name='charlie'/>
 <child name='delta'/>
 <child name='echo'/>
</alphabet>
XMLDOC

my ( $parent ) = $xml->findnodes( '/alphabet' );

my @all_kids  = $parent->childNodes;
my @real_kids = $parent->nonBlankChildNodes;

say 'All kids : ', scalar @all_kids;   # '11'
say 'Real kids : ', scalar @real_kids; # '5' => 6 blank child nodes

What puzzles me is that the parser makes a distinction between retrieving all child nodes and only non-blank ones.

It would seem then that there must be at least one sane use for these blank nodes. It would be interesting to know exactly what those uses are.


Solution

  • Consider this case from HTML:

    <div><b>hello</b><i>world</i></div>
    

    vs this one:

    <div>
        <b>hello</b>
        <i>world</i>
    </div>
    

    In the first example, there are no whitespace nodes, and the rendering engine will not place a space between helloworld. In the second example, since there is a whitespace node between the textnodes, it will come out as hello world.

    You need to know the whitespace nodes are there, since some XML languages will care about their placement.