Search code examples
pythonhtmlimageweb-scrapingweb

img TAG attribute in the first one is different than others I can't get the link? web scraping python


The code below is an example.

On the first img. You can see the class is (class_name) and the src= contains a link. But the rest of the img TAGS you will see the classes are different, and there is no src attribute there is data-src only.

So when I try to get the links, I am only able to get the links either for the first one or the rest of the links only if I change the ( get('src') to get('data-src') ).

Is there any way to get the links only as text?

import requests
from bs4 import BeautifulSoup

url = 'website.com'
soup = BeautifulSoup.get(url)

links = {
    '<img class="class_name" src="https://website1.png"/>',
    '<img class="class_name late" data-src="https://website2.png"/>',
    '<img class="class_name late" data-src="https://website3.png"/>',
}

for link in links:
    link.find('img', class_='class_name').get('src')
    print(link)

Thanks

I need the output like this:

https://website1.png
https://website2.png
https://website3.png

Solution

  • Simply select all of the images, iterate over the ResultSet and check if an attribute is available to extract its value and print it or append it do a list or set in case of avoiding duplicates.

    Example

    from bs4 import BeautifulSoup
    
    html = '''
    <img class="class_name" src="https://website1.png"/>
    <img class="class_name late" data-src="https://website2.png"/>
    <img class="class_name late" data-src="https://website3.png"/>
    '''
    soup = BeautifulSoup(html)
    
    for link in soup.select('img.class_name'):
        if link.get('src'):
            print(link.get('src'))
        else:
            print(link.get('data-src'))
    

    Output

    https://website1.png
    https://website2.png
    https://website3.png