Search code examples
pythonhtmltableofcontents

How do I generate a table of contents for HTML text in Python?


Assume that I have some HTML code, like this (generated from Markdown or Textile or something):

<h1>A header</h1>
<p>Foo</p>
<h2>Another header</h2>
<p>More content</p>
<h2>Different header</h2>
<h1>Another toplevel header
<!-- and so on -->

How could I generate a table of contents for it using Python?


Solution

  • Use an HTML parser such as lxml or BeautifulSoup to find all header elements.