python html regex html-parsing auto-increment

Python: How can I add a counter to the replacement argument of re.sub()

I'd like to add ids to html tags. For example, I'd like to change:

<p>First paragraph</p>
<p>Second paragraph</p>
<p>Third paragraph</p>

<p id="1">First paragraph</p>
<p id="2">Second paragraph</p>
<p id="3">Third paragraph</p>

IIRC, it's possible to use a lambda function to achieve this functionality, but I can't remember the exact syntax.

Solution

I would use an HTML parser, like BeautifulSoup.

The idea is to iterate over all paragraphs using enumerate() for indexing, starting with 1:

from bs4 import BeautifulSoup

data = """
<p>First paragraph</p>
<p>Second paragraph</p>
<p>Third paragraph</p>
"""

soup = BeautifulSoup(data, 'html.parser')
for index, p in enumerate(soup.find_all('p'), start=1):
    p['id'] = index

print soup

Prints:

<p id="1">First paragraph</p>
<p id="2">Second paragraph</p>
<p id="3">Third paragraph</p>