Search code examples
pythonhtmlparsingbeautifulsouphtml-parsing

BS4: removing <a> tags


I'm using BeautifulSoup 4, I have below HTML:

<tr>
  <td>London <a href="/company/mcrt/5" target="_blank">10 vol</a> 54 page</td>
</tr>

I'm trying to remove just the "a" tag and keep the text inside, like this:

<tr>
  <td>London 10 vol 54 page</td>
</tr>

Is there any way to do it with bs4?


Solution

  • You are searching for .unwrap() method:

    txt = '''<tr>
      <td>London <a href="/company/mcrt/5" target="_blank">10 vol</a> 54 page</td>
    </tr>'''
    
    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup(txt, 'html.parser')
    
    soup.a.unwrap()
    
    print(soup)
    

    Prints:

    <tr>
    <td>London 10 vol 54 page</td>
    </tr>