Search code examples
pythonbeautifulsoup

Get second srcset attribute using Beautiful Soup


I'm trying to get the second srcset attribute in beautiful Soup, the original html is as follows:

<picture class="card-picture ratio ratio-4x3">
<source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&amp;rmode=pad&amp;width=640&amp;rmode=pad&amp;width=640&amp;format=webp" type="image/webp"/>
<source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&amp;rmode=pad&amp;width=640&amp;rmode=pad&amp;width=640" type="image/jpeg"/>
<img alt="" class="card-img object-fit-contain is-contain" loading="lazy" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7">
</img>
</picture>

My code:

for result in results:
    imgel = result.find("source", attrs = {'srcset' : True})['srcset']

This returns the first srcset value _ I want to get the second value the png URL


Solution

  • Just select all <source> tags and use normal indexing:

    from bs4 import BeautifulSoup
    
    html_source = """\
    <picture class="card-picture ratio ratio-4x3">
    <source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&amp;rmode=pad&amp;width=640&amp;rmode=pad&amp;width=640&amp;format=webp" type="image/webp"/>
    <source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&amp;rmode=pad&amp;width=640&amp;rmode=pad&amp;width=640" type="image/jpeg"/>
    <img alt="" class="card-img object-fit-contain is-contain" loading="lazy" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7">
    </img>
    </picture>"""
    
    soup = BeautifulSoup(html_source, "html.parser")
    
    results = soup.select("picture")
    
    for result in results:
        second_img = result.select("source")[1]
        print(second_img)
    

    Prints:

    <source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&amp;rmode=pad&amp;width=640&amp;rmode=pad&amp;width=640" type="image/jpeg"/>
    

    OR: select image/jpeg:

    for result in results:
        jpeg_img = result.select_one('source[type="image/jpeg"]')
        print(jpeg_img)
    

    or if you want first jpeg or png:

    for result in results:
        img = result.select_one('source[type="image/jpeg"], source[type="image/png"]')
        print(img)