Search code examples
pythonhtmlpython-re

Replace string between two delimiters in html


How can I replace some string located between the delimiters href="" ?

<td><a href="https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n" target="_blank">https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n</a></td>
    </tr>

I want to replace this:

href="https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n"

with this:

href="LINK"

Solution

  • For a quick and dirty way, you could use re.sub() to match the 'href' tag and replace it with your own:

    import re
    html = """<td><a href="https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n" target="_blank">https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n</a></td>
        </tr>"""
    re.sub('">.*<\/a>', '">LINK<\/a>" ' , html)
    

    Output:

    '<td><a href="LINK" target="_blank">https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n</a></td>\n    </tr>'
    

    But remember that parsing HTML with regular expressions is not recommended, as it can have many edge cases. I would only use this for a quick and dirty way when I absolutely know how my input HTML is structured. For a more professional approach, you should look into HTML parsers (e.g. 'beautifulsoup').