Search code examples
pythonhtmlbeautifulsoupfindall

Finding all div elements with varying id value with BeautifulSoup


This question must be a duplicate, but for the sake of it, I can't find it anywhere.

html = """
<html>
<head>
</head>
<body>
<div id="7471292"></div>
<div id="5235252"></div>
<div href="/some/link/"></div>
<div id="7567327"></div>
<div id="1231312"></div>
<div class="card d-inline-block iteml_card elems3 section1 featured0 wished0"</div>
<div id="2342424"></div>
</body>
</html>
"""

#Create soup from html
soup = BeautifulSoup(html)

I want the following output:

[<div id="7471292"></div>,
 <div id="5235252"></div>,
 <div id="7567327"></div>,
 <div id="1231312"></div>,
 <div id="2342424"></div>]

We can do something like:

soup.find_all("div")

but this will return all divs. If we want to specify an id attractor, we have to fill in a concise value as well, seemingly rendering it useless:

soup.find_all('div', {'id': ""})

Solution

  • What happens?

    You are close to your goal - But soup.find_all('div', {'id': ""}) would be interpreted as an empty or non-existent attribute id, that is why you wont get your expected ResultSet.

    How to fix?

    It is not much to do and it do not really need a regex in your case, just use the keyword arguments and set your attribute to be True:

    soup.find_all('div', id=True)
    

    with dict syntax:

    soup.find_all('div', {'id':True})
    

    Or the equivalent css selector:

    soup.select('div[id]')
    

    Example

    html = """
    <html>
    <head>
    </head>
    <body>
    <div id="7471292"></div>
    <div id="5235252"></div>
    <div href="/some/link/"></div>
    <div id="7567327"></div>
    <div id="1231312"></div>
    <div class="card d-inline-block iteml_card elems3 section1 featured0 wished0"</div>
    <div id="2342424"></div>
    </body>
    </html>
    """
    
    #Create soup from html
    soup = BeautifulSoup(html)
    soup.find_all('div', {'id':True})
    

    Output

    [<div id="7471292"></div>,
     <div id="5235252"></div>,
     <div id="7567327"></div>,
     <div id="1231312"></div>,
     <div id="2342424"></div>]