Search code examples
pythonselenium-webdriverweb-scrapingweb

How do I prevent the image url from changing after scrapping using selenium?


I'm trying to scrape the cover photo of Facebook using python selenium

it returns the url properly, however it gets changed and show a blurry one.

from selenium import webdriver
from selenium.webdriver.common.by import By
import requests

url = 'https://www.facebook.com/events/851274783286508/?active_tab=about'
driver = webdriver.Edge()
driver.get(url)
cover_class = 'img.x1ey2m1c.x9f619.xds687c.x5yr21d.x10l6tqk.x17qophe.x13vifvy.xh8yej3'

img = driver.find_element(By.CSS_SELECTOR,cover_class)
print(img.get_attribute('src'))


requests.get(img.get_attribute('src'))


##################  THIS IS THE URL THAT SHOULD BE RETURNED  ################################



#perfect_src = https://scontent.fpsd2-1.fna.fbcdn.net/v/t39.30808-6/321296208_468127735500704_5967700403570897493_n.jpg?_nc_cat=106&ccb=1-7&_nc_sid=340051&_nc_eui2=AeFSxY7I0TKJvJRIx37yNz_jvkkF4L__0_G-SQXgv__T8a3A9q5B5t3jD54iDEhyayU7Wf-86Xko3e-lkPKiOl8v&_nc_ohc=kU48KdHWOFcAX_w-iW0&_nc_ht=scontent.fpsd2-1.fna&oh=00_AfCjDp7TSLV22b8WigiDlMedF9ONKuDXmW885vje3OAlMQ&oe=64573615





##################  THIS IS THE ACTUAL IMAGE URL I'M GETTING  ################################



#mehogcy_src = https://scontent.fpsd2-1.fna.fbcdn.net/v/t39.30808-6/321296208_468127735500704_5967700403570897493_n.jpg?stp=dst-jpg_fb50_s320x320&_nc_cat=106&ccb=1-7&_nc_sid=340051&_nc_ohc=kU48KdHWOFcAX_w-iW0&_nc_ht=scontent.fpsd2-1.fna&oh=00_AfDSvqeVreyDGTVq4If0dBtyBA6H1PA0YV9CVaY3DLnupg&oe=64573615

How do I fix that?

I tried to scrape it using selenium as it shows in the code attached.


Solution

  • I just changed below line:

    img = driver.find_element(By.CSS_SELECTOR, cover_class)
    

    To:

    wait = WebDriverWait(driver, 20)
    img = wait.until(EC.visibility_of_element_located((By.XPATH, "//img[@data-imgperflogname='profileCoverPhoto']")))
    

    Imports required:

    from selenium.webdriver.support.wait import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    

    Result:

    https://scontent-fra5-2.xx.fbcdn.net/v/t39.30808-6/321296208_468127735500704_5967700403570897493_n.jpg?stp=dst-jpg_s960x960&_nc_cat=106&ccb=1-7&_nc_sid=340051&_nc_ohc=kU48KdHWOFcAX8gWjDP&_nc_ht=scontent-fra5-2.xx&oh=00_AfA7hkoiTjbXx_ZMAKJEbYkLxLhlW-BsfxiMtFOBjLIIqA&oe=64593055
    
    Process finished with exit code 0
    

    When I click on this URL printed on console, it opens as below:

    enter image description here