I can't seem to get libxml2 to properly parse DTD from memory: the DTD contains references to external XHTML entities pointing to w3c.org. The links are working, the browser loads the content from them just fine. However, libxml2 reports failures to load HTTP resource even though returning the successful status from xmlIOParseDTD
function.
Here's the minimal test to reproduce the problem:
#include "libxml/xmlreader.h"
#include <string>
#include <fstream>
#include <iostream>
int main()
{
// Read DTD from file
std::ifstream f;
f.open("enml2.dtd");
if (!f.is_open()) {
std::cerr << "Can't open enml2.dtd file" << std::endl;
return 1;
}
std::string enml;
std::string line;
while(getline(f, line))
{
enml += line;
}
f.close();
// Init parser options
xmlInitParser();
xmlSubstituteEntitiesDefault(1);
xmlLoadExtDtdDefaultValue = 1;
// Parse DTD from memory
xmlParserInputBufferPtr pBuf = xmlParserInputBufferCreateMem(enml.c_str(), enml.size(),
XML_CHAR_ENCODING_UTF8);
if (!pBuf) {
std::cerr << "can't allocate input buffer for dtd validation" << std::endl;
return 2;
}
xmlDtdPtr pDtd = xmlIOParseDTD(NULL, pBuf, XML_CHAR_ENCODING_UTF8);
if (!pDtd) {
std::cerr << "can't parse dtd from buffer" << std::endl;
return 3;
}
std::cout << "Successfully parsed DTD" << std::endl;
xmlFreeDtd(pDtd);
return 0;
}
The mentioned enml2.dtd
file can be downloaded from here: http://xml.evernote.com/pub/enml2.dtd
Build (on Linux in my case):
g++ -I/usr/include/libxml2 main.cpp -o libxml2-test -lxml2
Run:
./libxml2-test
I/O warning : failed to load HTTP resource
n 1 for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">%HTMLlat1;
^
%HTMLlat1;
^
I/O warning : failed to load HTTP resource
for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">%HTMLsymbol;
^
%HTMLsymbol;
^
I/O warning : failed to load HTTP resource
for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">%HTMLspecial;
^
%HTMLspecial;
^
Successfully parsed DTD
The used version of libxml2 is 2.9.1+dfsg1-3ubuntu4.4
, I'm on Linux Mint 17 (corresponding to Ubuntu 14.04).
Upd.: I observe the same thing with libxml2 2.9.0 on OS X 10.9. Moreover, xmllint
command-line utility fails to fetch these external entries in precisely the same way as my example code, even if I use --loaddtd
option to explicitly allow the fetching of external DTD. Either I am really missing something about how it is supposed to work or I have encountered a bug of libxml2.
It appears the problem is not in libxml2 but in w3c site the reference to which is used by external entities in the dtd file in question. More details can be found in the answer to the similar question. I workarounded the problem by downloading the .ent
files from the links with the browser and including their whole contents into the dtd file instead of references.