Search code examples
pythonweb-scrapingbeautifulsoupimageurl

scrape images from a particular e-commerce website's link


I am scraping a e-commerce website for experience. I am currently facing a problem scraping images of a product. I have scraped the html codes for all present images of a product but can't extract the link from that html code.

the code i tried is:

import requests
from bs4 import BeautifulSoup
import pandas as pd
baseurl='https://www.preispirat24.com/neu-im-september/'
baseforimages='https://www.preispirat24.com/'
headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
   
}   

productlinks=[]
for x in range(0,1,1):    
    r=requests.get('https://www.preispirat24.com/neu-im-september/?page={}'.format(x))
    soup=BeautifulSoup(r.content, 'html.parser')

    productlist=soup.find_all('div',class_='title-description')
    item='title-description'

    for item in productlist:
        for link in item.find_all('a',href=True):
            productlinks.append(link['href'])
            a=(link['href'])
            
            

#testlink='https://www.preispirat24.com/Lufterfrischer/axe-air-fresher/axe-mini-vent-dark-temptation-air-freshener-lufterfrischer-6er-t-dsp.html'
insultlist=[]
images=[]
for link in productlinks:
    b=link
    try:
        r=requests.get(link,headers=headers)
        soup=BeautifulSoup(r.content, 'html.parser')
        title=soup.find('h1',class_="product-info-title-desktop hidden-xs hidden-sm").text.strip()
        description=soup.find(class_='tab-body active',itemprop="description").text.strip()
        itemnumber=soup.find('span',itemprop="model").text.strip()

        images=soup.find_all(class_='align-vertical')
        print(images)
        #print (images['src'])
    except:
        print('----')
    insult={
        'title':title,
        'description':description,
        'itemnumber':itemnumber,
        'images':images,
        'productlink':b
    }
   
    insultlist.append(insult)
df=pd.DataFrame(insultlist)
print('Saving :',title)
print(df.head)
df.to_csv('3veerapreispirat24.csv')

The output I get is something like:

<img alt="Mobile Preview: 99671" data-magnifier-src="images/product_images/original_images/99671(1).jpg" src="images/product_images/gallery_images/99671(1).jpg" title="Mobile Preview: 99671"/>
</div>, <div class="align-vertical">
<img alt="Mobile Preview: 99671" data-magnifier-src="images/product_images/original_images/99671.jpg" src="images/product_images/gallery_images/99671.jpg" title="Mobile Preview: 99671"/>
</div>]

The Output I want:

images/product_images/original_images/99671(1).jpg
images/product_images/gallery_images/99671(1).jpg
images/product_images/original_images/99671.jpg
images/product_images/gallery_images/99671.jpg"

Note I have tried: print(images['src']) it resulted exception printing ---

Example Product Link From the product-images to be extracted

Link Here

Thanks in advance for helping.


Solution

  • To get image URLs from the link, you can use this example:

    import requests
    from bs4 import BeautifulSoup
    
    
    url = 'https://www.preispirat24.com/Verbrauchsartikel/Hygiene-Artikel-127/mund-nasen-maske-3-lagig-pink-mit-nasenbuegel-ohrschlaufen-einheitsgroesse-10-stuec.html'
    soup = BeautifulSoup(requests.get(url).content, 'html.parser')
    
    for img in soup.select('#product_thumbnail_swiper [data-magnifier-src]'):
        print('https://www.preispirat24.com/' + img['data-magnifier-src'])
    

    Prints:

    https://www.preispirat24.com/images/product_images/original_images/99649mix.jpg
    https://www.preispirat24.com/images/product_images/original_images/99649.jpg
    https://www.preispirat24.com/images/product_images/original_images/99649_0.jpg
    https://www.preispirat24.com/images/product_images/original_images/99649_1.jpg
    

    EDIT: To save product to csv, you can do:

    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    
    
    url = 'https://www.preispirat24.com/Verbrauchsartikel/Hygiene-Artikel-127/mund-nasen-maske-3-lagig-pink-mit-nasenbuegel-ohrschlaufen-einheitsgroesse-10-stuec.html'
    soup = BeautifulSoup(requests.get(url).content, 'html.parser')
    
    
    title=soup.find('h1',class_="product-info-title-desktop hidden-xs hidden-sm").text.strip()
    description=soup.find(class_='tab-body active',itemprop="description").text.strip()
    itemnumber=soup.find('span',itemprop="model").text.strip()
    
    images = []
    for img in soup.select('#product_thumbnail_swiper [data-magnifier-src]'):
        images.append('https://www.preispirat24.com/' + img['data-magnifier-src'])
        # print('https://www.preispirat24.com/' + img['data-magnifier-src'])
    
    df = pd.DataFrame({
            'title':title,
            'description':description,
            'itemnumber':itemnumber,
            'images':[images],
            'productlink':url
        })
    
    df.to_csv('data.csv')
    print(df)
    

    Prints:

                                                   title  ...                                        productlink
    0  Mund Nasen Maske 3-lagig PINK mit Nasenbügel, ...  ...  https://www.preispirat24.com/Verbrauchsartikel...
    
    [1 rows x 5 columns]
    

    And saves data.csv