With this code:
from lxml import etree
with open( 'C:\\Python33\\projects\\xslt', 'r' ) as xslt, open( 'C:\\Python33\\projects\\result', 'a+' ) as result, open( 'C:\\Python33\\projects\\xml', 'r' ) as xml:
s_xml = xml.read()
s_xslt = xslt.read()
transform = etree.XSLT(etree.XML(s_xslt))
out = transform(etree.XML(s_xml))
result.write(out)
I get this error:
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
from projects.xslt_transform import trans
File ".\projects\xslt_transform.py", line 17, in <module>
transform = etree.XSLT(etree.XML(s_xslt))
File "xslt.pxi", line 409, in lxml.etree.XSLT.__init__ (src\lxml\lxml.etree.c:150256)
lxml.etree.XSLTParseError: Invalid expression
this couple xml/xslt files works with other tools.
Also I had to get rid of the encoding attribute in the top declarations for both files in order not to get:
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
can it be related ?
EDIT:
this does not work either (i get the same error):
with open( 'C:\\Python33\\projects\\xslt', 'r',encoding="utf-8" ) as xslt, open( 'C:\\Python33\\projects\\result', 'a+',encoding="utf-8" ) as result, open( 'C:\\Python33\\projects\\xml', 'r',encoding="utf-8" ) as xml:
s_xml = etree.parse(BytesIO(bytes(xml.read(),'UTF-8')))
s_xslt = etree.parse(BytesIO(bytes(xslt.read(),'UTF-8')))
transform = etree.XSLT(s_xslt)
out = transform(s_xml)
print(out.tostring())
reading lxml source code: this returns an exception:
xslt.xsltParseStylesheetDoc(c_doc)
so it seems an actual parse error. Can it be namespace related ?
EDIT SOLVED:
s_xml = etree.parse(xml.read())
s_xslt = etree.parse(xslt.read())
thanks tomalak
Parsing XML is more complicated than "open a text file, stuff the resulting string into etree".
XML files are serialized representations of a DOM tree. They are not to be handled as text even though they come in the shape of a text file. They come in multiple byte encodings and finding out which encoding a certain file uses is anything but trivial.
XML parsers have proper detection mechanisms built in and therefore they should be used to open XML files. The the basic open()
+ read()
calls are not enough to correctly handle the file contents.
lxml.etree
provides the parse()
function that can accept a number of argument types:
- an open file object (make sure to open it in binary mode)
- a file-like object that has a .read(byte_count) method returning a byte string on each call
- a filename string
- an HTTP or FTP URL string
and then will correctly parse the associated document back into a DOM tree.
Your code should look more like this:
from lxml import etree
f_xsl = 'C:\\Python33\\projects\\xslt'
f_xml = 'C:\\Python33\\projects\\xml'
f_out = 'C:\\Python33\\projects\\result'
transform = etree.XSLT(etree.parse(f_xsl))
result = transform(etree.parse(f_xml))
result.write(f_out)