For Example there is a code present in HTML
<p>Example of a paragraph element.</p>
<ul>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
</ul>
needs to represented into (in case of a yaml format) or json is also fine
p: Example of a paragraph element.
ul:
li:Coffee
li:Tea
li:Milk
Not sure there is a package, but you could just iterate through each tag in the html, then use .name
and .text
to work it out hat way, and write to file:
html = '''<p>Example of a paragraph element.</p>
<ul>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
</ul>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
for tag in soup.find_all():
print (tag.name + ':' + tag.text)
Output:
p:Example of a paragraph element.
ul:
Coffee
Tea
Milk
li:Coffee
li:Tea
li:Milk