scraping all links of clips in twitch directory

How do i collect links of clips from twitch web page ?

I want to get links for all video clips from twitch directory like this

I tried requests with BeautifulSoup, failed, then Requests with lxml, failed too.

I tried selenium webdriver to find clip links by xpath and click on each one

driver.find_element_by_xpath('__').click()

but failed too despite xpath expressions are correct.

How do i collect links of clips from twitch web page ? please help.

Solution

You can identify the video clip urls using the below XPath :

xpath = //a[@data-a-target='preview-card-image-link']

But the above will locate only the first 20 urls and after that you need to do scroll down so that it will load dynamically from there. For getting the first 20 urls you can try the below code :

from selenium import webdriver
from time import sleep
driver = webdriver.Chrome('C:\\NotBackedUp\\chromedriver.exe')
driver.get("https://www.twitch.tv/directory/game/Apex%20Legends/clips?fbclid=IwAR2xYPFh3Um2YS4EsDkjAdA0b-CMvjQTLVLeNW5D77-aPh3IqwW9c4e7lIM&range=24hr")
sleep(3)
links = driver.find_elements_by_xpath("//a[@data-a-target='preview-card-image-link']")
for link in links:
    print link.get_attribute('href')

If you count manually by inspecting, there are 1020 clips are there in that page. So below code will scroll down until all the clips are loaded and will prints all the links :

from selenium import webdriver
from time import sleep
driver = webdriver.Chrome('C:\\NotBackedUp\\chromedriver.exe')
driver.get("https://www.twitch.tv/directory/game/Apex%20Legends/clips?fbclid=IwAR2xYPFh3Um2YS4EsDkjAdA0b-CMvjQTLVLeNW5D77-aPh3IqwW9c4e7lIM&range=24hr")

sleep(3)
i = 1
while i <= 1020:
    links = driver.find_elements_by_xpath("//a[@data-a-target='preview-card-image-link']")
    driver.execute_script('arguments[0].scrollIntoView(true);', links[len(links)-1])
    print "=> i :", i
    i+=20
    sleep(1)

links = driver.find_elements_by_xpath("//a[@data-a-target='preview-card-image-link']")
for link in links:
    print link.get_attribute('href')

print("=> Done...")

I hope it helps