I have another problem I'm desperate about. I think there are many solutions to this problem, but I would like to know if my approach can be implemented somehow.
I have a XML file uses one external DTD and is defined with the XML DOCTYP.
The xml-file are parsed with Python (lxml). So it is possible to validate the different files automatically with the DTD's defined in the XML DOCTYP. I use an external DTD which can be accessed via internet address. But this internet site redirects every request to the HTTPS port. For this reason Python cannot access the external DTD.
Thanks to a friend of mine I was able to use an old, unused website that still runs on HTTP. The DTD on this stored website can be found and used by the parser.
Now for my question. Is it possible to use an external DTD with Python-lxml that is only accessible via a HTTPS server? Unfortunately I have no possibility to create an area on the server that uses the HTTP port.
I've already tried to get the external DTD via an HTTP request but it gets redirected to the HTTPS port.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE book PUBLIC "-//AA//Test//EN" "***">
<!-- <!DOCTYPE book PUBLIC "-//AA//Test//EN" "***"> -->
<book>
<book-meta>
<book-id pub-id-type="other">handbook</book-id>
<book-title-group Id="1">
<book-title name="Hallo">The NCBI Handbook</book-title>
</book-title-group>
</book-meta>
</book>
For completeness here is an example DTD.
<!ELEMENT book ANY>
<!ATTLIST book
Release CDATA "v0.0.1"
>
<!ELEMENT book-meta ANY> <!-- # related objects: 0 -->
<!ATTLIST book-meta
Value CDATA "Das ist eine Information"
>
<!ELEMENT book-id ANY> <!-- # related objects: 0 -->
<!ATTLIST book-id
pub-id-type CDATA #REQUIRED
>
<!ELEMENT book-title-group ANY> <!-- # related objects: 0 -->
<!ATTLIST book-title-group
Id CDATA #IMPLIED
>
<!ELEMENT book-title ANY> <!-- # related objects: 0 -->
<!ATTLIST book-title
name CDATA #REQUIRED
>
For parsing the XML files I use a python script with the library lxml. Following is the test program.
import xml.etree.ElementTree as ET
import lxml
from lxml import etree
myParser = lxml.etree.XMLParser(attribute_defaults = True, dtd_validation = True, load_dtd =True, no_network = False)
xmlFile = lxml.etree.parse("XML_DTDValidation.xml", parser=myParser)
xmlFile.xinclude()
xmlFile.write("XML_DTDValidation_out.xml",method="xml",xml_declaration=True, encoding='utf-8',pretty_print=True)
I hope I could summarize my problem well and someone can help me.
This page describes some ways to work around this.
You can either: