Search code examples
pythonseleniumurlscreen-scrapinggif

Scraping GIF url from Websites


I am very new to web scraping and trying to scrape gif urls from a website. For example, from gifer.com, search gifs for "smile" and then download urls for all gifs listed. Below is an example of the source from which I want to extract src element for the video (https://i.gifer.com/ON0.mp4 in this case).

<div class="page-media-swipe desktop">
  <div class="container">
    <div class="swipe-left">
      <span class="icon-arrow-left-2  icon" style="color: rgb(255, 255, 255); font-size: 44px;"></span>
    </div>
    <div class="media desktop" style="width: 367.462px;">
      <div style="padding-top: 122.462%;">
        <div class="media-container1">
          <div class="media-container2" style="width: 367.462px;">
            <div>
              <video poster="https://i.gifer.com/fetch/w300-preview/d0/d0e6e89a42c43d31b5913e232d87af7b.gif" class="full-media" loop="" autoplay="" playsinline="">
                <source src="https://i.gifer.com/ON0.mp4" type="video/mp4">
              </video>
            </div>
          </div>
        </div>
      </div>
    </div>
    <div class="swipe-right">
      <span class="icon-arrow-right-2  icon" style="color: rgb(255, 255, 255); font-size: 44px;">
      </span>
    </div>
  </div>
</div>

There are more than thousands of such results and I was advised to use Python and Selenium. However my knowledge of Selenium and Python is limited I tried below but I am not able to make much headway.


from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

driver.get("https://gifer.com/en/gifs/smile")
imgResults = driver.find_elements(By.CLASS_NAME, "media-container2")

print(len(imgResults))
#print(driver.page_source)
for i in range(0,len(imgResults)):
    print(imgResults[i])

driver.quit()

Above returns 4 elements-

<selenium.webdriver.remote.webelement.WebElement (session="fac424650675a90b2a8dee91efdc01f4", element="16e771ca-37d8-45a0-8200-0f03da0b7d14")> <selenium.webdriver.remote.webelement.WebElement (session="fac424650675a90b2a8dee91efdc01f4", element="8c9abdcb-bc9d-47da-9958-109e722b3ae9")> <selenium.webdriver.remote.webelement.WebElement (session="fac424650675a90b2a8dee91efdc01f4", element="d9640144-4ba1-414b-aa4f-5141387335ef")> <selenium.webdriver.remote.webelement.WebElement (session="fac424650675a90b2a8dee91efdc01f4", element="9626db84-1da9-42ad-b314-56222a5e933b")>

Now, how do I grab the source src link for each video element is what I am not getting.


Solution

  • I was wrong, no need to load a new page to get the mp4 link:

    for img in driver.find_elements(By.CSS_SELECTOR, "figure a"):
        code = img.get_attribute('href').split('/')[-1]
        link = f'https://i.gifer.com/{code}.mp4'
        print(link)
    

    output

    https://i.gifer.com/fzvh.mp4
    https://i.gifer.com/7F5y.mp4
    https://i.gifer.com/6qOR.mp4
    https://i.gifer.com/3JT.mp4
    ...
    

    You can obtain the list of links in one line

    links = [f"https://i.gifer.com/{img.get_attribute('href').split('/')[-1]}.mp4" for img in driver.find_elements(By.CSS_SELECTOR, "figure a")]