I'm trying to get the second srcset attribute in beautiful Soup, the original html is as follows:
<picture class="card-picture ratio ratio-4x3">
<source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&rmode=pad&width=640&rmode=pad&width=640&format=webp" type="image/webp"/>
<source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&rmode=pad&width=640&rmode=pad&width=640" type="image/jpeg"/>
<img alt="" class="card-img object-fit-contain is-contain" loading="lazy" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7">
</img>
</picture>
My code:
for result in results:
imgel = result.find("source", attrs = {'srcset' : True})['srcset']
This returns the first srcset value _ I want to get the second value the png URL
Just select all <source>
tags and use normal indexing:
from bs4 import BeautifulSoup
html_source = """\
<picture class="card-picture ratio ratio-4x3">
<source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&rmode=pad&width=640&rmode=pad&width=640&format=webp" type="image/webp"/>
<source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&rmode=pad&width=640&rmode=pad&width=640" type="image/jpeg"/>
<img alt="" class="card-img object-fit-contain is-contain" loading="lazy" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7">
</img>
</picture>"""
soup = BeautifulSoup(html_source, "html.parser")
results = soup.select("picture")
for result in results:
second_img = result.select("source")[1]
print(second_img)
Prints:
<source srcset="/shop/media/L004D000_picture.PNG?context=bWFzdGVyfGltYWdlc3wzMDE3NTN8aW1hZ2UvcG5nfGgwMS9oMjcvODg0ODIyMDYxODc4Mi9MMDA0RDAwMF9waWN0dXJlLlBOR3wyZjRiZWE1NDU2MWU1MjUzMzU5MjAwNGVlYmIzY2MwNGQzODExMDI3NjNkMDE3YjQ4NGMwNjFlMGVkNTU2OWIy&rmode=pad&width=640&rmode=pad&width=640" type="image/jpeg"/>
OR: select image/jpeg
:
for result in results:
jpeg_img = result.select_one('source[type="image/jpeg"]')
print(jpeg_img)
or if you want first jpeg or png:
for result in results:
img = result.select_one('source[type="image/jpeg"], source[type="image/png"]')
print(img)