Search code examples
pythonhtmllxmlpretty-print

How to Pretty Print HTML to a file, with indentation


I am using lxml.html to generate some HTML. I want to pretty print (with indentation) my final result into an html file. How do I do that?

This is what I have tried and got till now

import lxml.html as lh
from lxml.html import builder as E
sliderRoot=lh.Element("div", E.CLASS("scroll"), style="overflow-x: hidden; overflow-y: hidden;")
scrollContainer=lh.Element("div", E.CLASS("scrollContainer"), style="width: 4340px;")
sliderRoot.append(scrollContainer)
print lh.tostring(sliderRoot, pretty_print = True, method="html")

As you can see I am using the pretty_print=True attribute. I thought that would give indented code, but it doesn't really help. This is the output :

<div style="overflow-x: hidden; overflow-y: hidden;" class="scroll"><div style="width: 4340px;" class="scrollContainer"></div></div>


Solution

  • I ended up using BeautifulSoup directly. That is something lxml.html.soupparser uses for parsing HTML.

    BeautifulSoup has a prettify method that does exactly what it says it does. It prettifies the HTML with proper indents and everything.

    BeautifulSoup will NOT fix the HTML, so broken code, remains broken. But in this case, since the code is being generated by lxml, the HTML code should be at least semantically correct.

    In the example given in my question, I will have to do this :

    from bs4 import BeautifulSoup as bs
    root = lh.tostring(sliderRoot) #convert the generated HTML to a string
    soup = bs(root)                #make BeautifulSoup
    prettyHTML = soup.prettify()   #prettify the html