Search code examples
pythonbeautifulsoupindentationcode-formattingpretty-print

Custom indent width for BeautifulSoup .prettify()


Is there any way to define custom indent width for .prettify() function? From what I can get from it's source -

def prettify(self, encoding=None, formatter="minimal"):
    if encoding is None:
        return self.decode(True, formatter=formatter)
    else:
        return self.encode(encoding, True, formatter=formatter)

There is no way to specify indent width. I think it's because of this line in the decode_contents() function -

s.append(" " * (indent_level - 1))

Which has a fixed length of 1 space! (WHY!!) I tried specifying indent_level=4, that just results in this -

    <section>
     <article>
      <h1>
      </h1>
      <p>
      </p>
     </article>
    </section>

Which looks just plain stupid. :|

Now, I can hack this away, but I just want to be sure if there is anything I'm missing. Because this should be a basic feature. :-/

If you have some better way of prettifying HTML codes, let me know.


Solution

  • I actually dealt with this myself, in the hackiest way possible: by post-processing the result.

    r = re.compile(r'^(\s*)', re.MULTILINE)
    def prettify_2space(s, encoding=None, formatter="minimal"):
        return r.sub(r'\1\1', s.prettify(encoding, formatter))
    

    Actually, I monkeypatched prettify_2space in place of prettify in the class. That's not essential to the solution, but let's do it anyway, and make the indent width a parameter instead of hardcoding it to 2:

    orig_prettify = bs4.BeautifulSoup.prettify
    r = re.compile(r'^(\s*)', re.MULTILINE)
    def prettify(self, encoding=None, formatter="minimal", indent_width=4):
        return r.sub(r'\1' * indent_width, orig_prettify(self, encoding, formatter))
    bs4.BeautifulSoup.prettify = prettify
    

    So:

    x = '''<section><article><h1></h1><p></p></article></section>'''
    soup = bs4.BeautifulSoup(x)
    print(soup.prettify(indent_width=3))
    

    … gives:

    <html>
       <body>
          <section>
             <article>
                <h1>
                </h1>
                <p>
                </p>
             </article>
          </section>
       </body>
    </html>
    

    Obviously if you want to patch Tag.prettify as well as BeautifulSoup.prettify, you have to do the same thing there. (You might want to create a generic wrapper that you can apply to both, instead of repeating yourself.) And if there are any other prettify methods, same deal.