Search code examples
libxml2w3c

libxml2: xmlIOParseDTD: I/O warning : failed to load HTTP resource


I can't seem to get libxml2 to properly parse DTD from memory: the DTD contains references to external XHTML entities pointing to w3c.org. The links are working, the browser loads the content from them just fine. However, libxml2 reports failures to load HTTP resource even though returning the successful status from xmlIOParseDTD function.

Here's the minimal test to reproduce the problem:

#include "libxml/xmlreader.h"
#include <string>
#include <fstream>
#include <iostream>

int main()
{
    // Read DTD from file
    std::ifstream f;
    f.open("enml2.dtd");
    if (!f.is_open()) {
        std::cerr << "Can't open enml2.dtd file" << std::endl;
        return 1;
    }

    std::string enml;
    std::string line;
    while(getline(f, line))
    {
        enml += line;
    }

    f.close();

    // Init parser options
    xmlInitParser();
    xmlSubstituteEntitiesDefault(1);
    xmlLoadExtDtdDefaultValue = 1;

    // Parse DTD from memory
    xmlParserInputBufferPtr pBuf = xmlParserInputBufferCreateMem(enml.c_str(), enml.size(),
                                                             XML_CHAR_ENCODING_UTF8);
    if (!pBuf) {
        std::cerr << "can't allocate input buffer for dtd validation" << std::endl;
        return 2;
    }

    xmlDtdPtr pDtd = xmlIOParseDTD(NULL, pBuf, XML_CHAR_ENCODING_UTF8);
    if (!pDtd) {
        std::cerr << "can't parse dtd from buffer" << std::endl;
        return 3;
    }

    std::cout << "Successfully parsed DTD" << std::endl;
    xmlFreeDtd(pDtd);
    return 0;
}

The mentioned enml2.dtd file can be downloaded from here: http://xml.evernote.com/pub/enml2.dtd

Build (on Linux in my case):

g++ -I/usr/include/libxml2 main.cpp -o libxml2-test -lxml2

Run:

./libxml2-test 
I/O warning : failed to load HTTP resource
n 1 for XHTML//EN"   "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">%HTMLlat1;
                                                                               ^
 %HTMLlat1; 
           ^
I/O warning : failed to load HTTP resource
for XHTML//EN"   "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">%HTMLsymbol;
                                                                               ^
 %HTMLsymbol; 
         ^
I/O warning : failed to load HTTP resource
for XHTML//EN"   "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">%HTMLspecial;
                                                                               ^
 %HTMLspecial; 
              ^
Successfully parsed DTD

The used version of libxml2 is 2.9.1+dfsg1-3ubuntu4.4, I'm on Linux Mint 17 (corresponding to Ubuntu 14.04).

Upd.: I observe the same thing with libxml2 2.9.0 on OS X 10.9. Moreover, xmllint command-line utility fails to fetch these external entries in precisely the same way as my example code, even if I use --loaddtd option to explicitly allow the fetching of external DTD. Either I am really missing something about how it is supposed to work or I have encountered a bug of libxml2.


Solution

  • It appears the problem is not in libxml2 but in w3c site the reference to which is used by external entities in the dtd file in question. More details can be found in the answer to the similar question. I workarounded the problem by downloading the .ent files from the links with the browser and including their whole contents into the dtd file instead of references.