Search code examples
pythonxhtmlpisa

xhtml support in pisa v3.0.33


I am trying to convert html to pdf using pisa. I am using the following line of code -

pisa.CreatePDF(htmlCode, pdfFile, xhtml=True )

I get the following error. pdf creation failed with error 'module' object has no attribute 'XHTMLParser'

I have html5lib 1.0b3 installed. It used to work before but something happened (may be I updated some of the modules). So does any one know why I keep getting the above error?

When I do not pass the "xhtml=True", the call succeeds but the pdf generated is all wrong. Can I get around this somehow? Is it possible to convert a web page from xhtml to html?

How do I know whether a particular page is in xhtml or not?

The last two questions might not make sense because I do not write html code and can only read it.

Thanks for any help.


Solution

  • There is no XHTMLParser in html5parser, and the source code of pisa indicates that the xhtml=True flag is permanently broken:

    if xhtml:
        #TODO: XHTMLParser doesn't see to exist...
        parser = html5lib.XHTMLParser(tree=treebuilders.getTreeBuilder("dom"))
    

    Fortunately, XHTML is often valid HTML as well, so you don't need any conversion. Therefore, simply find out why the pdf generated is all wrong - XHTML is not the problem here.