Search code examples
xmlperlentitydtdxml-libxml

How do I create ENTITY references in the DOCTYPE using perl/LibXML


I'm trying to create the following DTD containing entity declarations:

<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" 
[ <!ENTITY icon.url "https://example.com/icon.png"> 
<!ENTITY base.url "https://example.com/content/" > ]>

I can successfully create the DOCTYPE without the entity references:

#!/usr/bin/perl -w
use strict;
use XML::LibXML;

my $doc = XML::LibXML::Document->new('1.0','UTF-8');
my $dtd = $doc->createInternalSubset( "LinkSet", "-//NLM//DTD LinkOut 1.0//EN", "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" );

my $ls = $doc->createElement( "LinkSet" );
$doc->setDocumentElement($ls);

print $doc->toString;
exit;

Results in:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd">
<LinkSet/>

The XML::LibXML documentation shows how to add an entity reference to a document, but not how to declare an entity in the DOCTYPE.

A similar (but PHP-based) question points to creating the ENTITY references as a string and parsing that. Is this the best approach in Perl too?


Solution

  • The documentation for XML::LibXML::Document says this

    [The Document Class] inherits all functions from XML::LibXML::Node as specified in the DOM specification. This enables access to the nodes besides the root element on document level - a "DTD" for example. The support for these nodes is limited at the moment.

    It also makes it clear later on that the source of these limitations is libxml2 itself, not the Perl module. This makes sense, as the DTD has a completely different syntax from XML (or even an XML Processing Instruction) even though it may look superficially similar.

    The only way appears to be to parse a basic document with the required DTD and work with that

    Like so

    use strict;
    use warnings 'all';
    
    use XML::LibXML;
    
    my $doc = XML::LibXML->load_xml(string => <<__END_XML__);
    <?xml version="1.0" encoding="UTF-8" standalone="no" ?>
    <!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" 
    [
      <!ENTITY icon.url "https://example.com/icon.png"> 
      <!ENTITY base.url "https://example.com/content/">
    ]>
    
    <LinkSet/>
    __END_XML__
    
    print $doc;
    

    output

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" [
    <!ENTITY icon.url "https://example.com/icon.png">
    <!ENTITY base.url "https://example.com/content/">
    ]>
    <LinkSet/>