I'd like to add ids to html tags. For example, I'd like to change:
<p>First paragraph</p>
<p>Second paragraph</p>
<p>Third paragraph</p>
to
<p id="1">First paragraph</p>
<p id="2">Second paragraph</p>
<p id="3">Third paragraph</p>
IIRC, it's possible to use a lambda function to achieve this functionality, but I can't remember the exact syntax.
I would use an HTML parser, like BeautifulSoup
.
The idea is to iterate over all paragraphs using enumerate()
for indexing, starting with 1
:
from bs4 import BeautifulSoup
data = """
<p>First paragraph</p>
<p>Second paragraph</p>
<p>Third paragraph</p>
"""
soup = BeautifulSoup(data, 'html.parser')
for index, p in enumerate(soup.find_all('p'), start=1):
p['id'] = index
print soup
Prints:
<p id="1">First paragraph</p>
<p id="2">Second paragraph</p>
<p id="3">Third paragraph</p>