Search code examples
pythonhtmlbeautifulsoupsrconerror

Python requests gives image src as relative path instead of absolute path


In the picture below I have the link of the image as src.

src of an image in html

but when using BeautifulSoup I got this output:

image['src']
assets/images/content/TUL_5890.jpg

Could you please let me know how to extract the image link in such a case? I think that is because of the onerror in the code. but I don't know how to fix it .


Solution

  • If you see response html present in soup,

    <a class="img-wrapper fancybox" data-caption="Pedestrian Crosswalk Sign" data-fancybox="group" href="assets/images/content/street_view_1a.jpg">
    <img alt="Pedestrian Crosswalk Sign" src="assets/images/content/street_view_1a.jpg"/>
    

    it does not have the entire path as you see in chrome which is probably added by your browser. Hence you were not getting the full path. You will have to extract the tag src and concat it with the FQDN.

    from bs4 import BeautifulSoup
    import requests
    response = requests.get('https://www.pexco.com/traffic/products/pedestrian-safety-products/in-street-pedestrian-crosswalk-signs/')
    
    soup = BeautifulSoup(response.text, 'lxml')
    for imgTag in soup.find_all('img'):
        img_src = imgTag['src']
        if ('assets' in img_src):
            print('https://www.pexco.com/' + img_src)
        else:
            print(img_src)
    

    This gives us :

    https://www.webtraxs.com/webtraxs.php?id=pexco&st=img
    https://www.pexco.com/assets/images/template/pexco-logo-dark.svg
    https://www.pexco.com/assets/images/banners/bg-banner-traffic-desktop.jpg
    https://www.pexco.com/assets/images/content/TUL_5890.jpg
    https://www.pexco.com/assets/images/content/Davidson_STOP_4_Ped_Sign_Atlanta_012309.jpg
    https://www.pexco.com/assets/images/content/P0000689.jpg
    https://www.pexco.com/assets/images/content/street_view_1a.jpg
    https://www.pexco.com/assets/images/content/street_view_2a.jpg
    https://www.pexco.com/assets/images/content/TUL_5890.jpg
    https://www.pexco.com/assets/images/content/Davidson_STOP_4_Ped_Sign_Atlanta_012309.jpg
    https://www.pexco.com/assets/images/content/P0000689.jpg
    https://www.pexco.com/assets/images/content/street_view_1a.jpg
    https://www.pexco.com/assets/images/content/street_view_2a.jpg
    https://www.pexco.com/assets/images/content/CADdetails_Microsite_Button.jpg
    https://www.pexco.com/assets/images/template/pexco-logo-dark.svg
    https://www.pexco.com/assets/images/template/fb-icon.jpg
    https://www.pexco.com/assets/images/template/LI-icon.jpg
    https://www.pexco.com/assets/images/template/YT-icon.jpg
    https://px.ads.linkedin.com/collect/?pid=2856522&fmt=gif
    

    EDIT :

    As discussed with OP, she needs a solution that directly returns her the full url. Selenium can be used in this case.

    Please try the following code.

    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.common.by import By
    
    chrome_path = r"C:\Users\hpoddar\Desktop\Tools\chromedriver_win32\chromedriver.exe" # PUT YOUR CHROME PATH HERE
    s = Service(chrome_path)
    
    url = 'https://www.pexco.com/traffic/products/pedestrian-safety-products/in-street-pedestrian-crosswalk-signs/'
    driver = webdriver.Chrome(service=s)
    driver.get(url)
    
    images = driver.find_elements(By.TAG_NAME, 'img')
    for image in images:
        print(image.get_attribute('src'))
    

    which gives us the expected output :

    https://www.pexco.com/assets/images/template/pexco-logo-dark.svg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/banners/bg-banner-traffic-desktop.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/TUL_5890.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/Davidson_STOP_4_Ped_Sign_Atlanta_012309.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/P0000689.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/street_view_1a.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/street_view_2a.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/TUL_5890.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/Davidson_STOP_4_Ped_Sign_Atlanta_012309.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/P0000689.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/street_view_1a.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/street_view_2a.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/CADdetails_Microsite_Button.jpg
    https://www.pexco.com/assets/images/template/pexco-logo-dark.svg
    https://www.gstatic.com/images/branding/googlelogo/1x/googlelogo_color_42x16dp.png
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/template/fb-icon.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/template/LI-icon.jpg
    https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/template/YT-icon.jpg
    https://www.gstatic.com/images/branding/product/1x/translate_24dp.png
    https://marvel-b1-cdn.bc0a.com/f00000000266812/cdn1.thelivechatsoftware.com/assets/interchanges/pexco.com/resources/pexco_2021-11-18.03-09-45.png