Search code examples
pythonhtmlpython-3.xbeautifulsouptags

Delete a specific tag from main soup in BeautifulSoup4 (python)


This is what i have tried - look at soup.div.decompose(), I also tried soup.elements.div.decompose(). Also this is using content from DataTables and this my first time using it so if there's a better way to achieve what i'm doing please tell me! Thanks in advace!

import bs4

with open('MapPage.html', 'r', encoding="utf8") as f:
    txt = f.read()
    soup = bs4.BeautifulSoup(txt,"html5lib")

elements = soup.find_all('tr')
elements.pop(0)

def DeleteData(msgID):
    for div in elements:
        ID = div.find('a').contents[0]
        if int(msgID)==int(ID):
            soup.div.decompose()
            return
    print('Failed to delete data from', msgID)

I'm hoping i'll be able to then just write the soup to the 'MapPage.html' again. The error AttributeError: 'NoneType' object has no attribute 'decompose' is produced. This is the output when printing div: (Link to html file) This is the output when printing div


Solution

  • If I understand right, you like to decompose() the <tr> that contains a specific value in its <a>.

    Main issue is that you try to perform soup.div.decompose() what means, that you like to decompose() first <div> of soup object.

    Simply use:

    div.decompose()
    

    or even better change your variable name to a none tag name:

    e.decompose()
    

    Example

    from bs4 import BeautifulSoup
    
    html = '''
    <html><body>
        <h2>Welcome to our collection of community made maps!</h2>
        <table id="example" class="cell-border" style="width:100%">
            <thead>
                <tr><th>ID</th><th>Author</th><th>Content</th><th>Thumbnail</th><th>Download</th><th>Rating</th>
                </tr>
            </thead>
            <tbody>
                <tr>
                    <td><a href="https://discord.com/channels/932741876174454914/932881912714895390/939257309387980851">939257309387980851</a></td>
                    <td>Matter</td><td>Cervinia Source</td><td><img src="https://media.discordapp.net/attachments/932881912714895390/939257307290796062/unknown.png" alt="Cervina Thumb" width="300" height="auto"></td><td><a href="https://discord.com/channels/932741876174454914/932881912714895390/939257309387980851">Download</a></td><td>5</td>
                </tr>
                <tr><td><a href="https://discord.com/channels/932741876174454914/932881912714895390/939257309387980851">939257309387980852</a></td><td>Tea</td><td>Chamonix</td><td><img src="https://media.discordapp.net/attachments/932881912714895390/939257307290796062/unknown.png" alt="Cervina Thumb" width="300" height="auto"></td><td><a href="https://discord.com/channels/932741876174454914/932881912714895390/939257309387980851">Download</a></td><td>5</td></tr>
            </tbody>
        </table>
    </body></html>
    '''
    soup = BeautifulSoup(html,)
    elements = soup.select('tr:has(td)')
    
    def DeleteData(msgID):
        for e in elements:
            ID = e.find('a').contents[0]
            if int(msgID)==int(ID):
                e.decompose()
                return
            print('Failed to delete data from', msgID)
    
    DeleteData(939257309387980851)