The code below is an example.
On the first img. You can see the class is (class_name) and the src= contains a link. But the rest of the img TAGS you will see the classes are different, and there is no src attribute there is data-src only.
So when I try to get the links, I am only able to get the links either for the first one or the rest of the links only if I change the ( get('src') to get('data-src') ).
Is there any way to get the links only as text?
import requests
from bs4 import BeautifulSoup
url = 'website.com'
soup = BeautifulSoup.get(url)
links = {
'<img class="class_name" src="https://website1.png"/>',
'<img class="class_name late" data-src="https://website2.png"/>',
'<img class="class_name late" data-src="https://website3.png"/>',
}
for link in links:
link.find('img', class_='class_name').get('src')
print(link)
Thanks
I need the output like this:
https://website1.png
https://website2.png
https://website3.png
Simply select all of the images, iterate over the ResultSet
and check if an attribute is available to extract its value and print it or append it do a list
or set
in case of avoiding duplicates.
from bs4 import BeautifulSoup
html = '''
<img class="class_name" src="https://website1.png"/>
<img class="class_name late" data-src="https://website2.png"/>
<img class="class_name late" data-src="https://website3.png"/>
'''
soup = BeautifulSoup(html)
for link in soup.select('img.class_name'):
if link.get('src'):
print(link.get('src'))
else:
print(link.get('data-src'))
https://website1.png
https://website2.png
https://website3.png