Search code examples
pythonselenium-webdriverweb-scrapingtwitter

How to get Selenium to extract the link for each tweet via clicking "Copy link" on Twitter?


I'm trying to extract the links to each specific tweet on Twitter using Selenium (with Python). E.g., let's say I'm looking at https://twitter.com/search?q=from%3A%40POTUS&src=typed_query&f=live

Each tweet has this arrow button on the bottom-right

Arrow

that upon clicking, it reveals 3 more options, the first one being the "Copy link" option that saves the direct link of the tweet onto the clipboard.

enter image description here

How do I instruct Selenium to click that arrow + copy the link, and then save the URL for all the tweets on the page?

At the moment my code looks something like this:

from selenium import webdriver
driver = webdriver.Firefox()
driver.get("https://twitter.com/search?q=from%3A%40POTUS&src=typed_query&f=live")

links = []
articles = driver.find_elements(By.XPATH, '//article[@data-testid = "tweet"]')
for article in articles:
    link = article.find_element(By.XPATH, './/svg').click()   # cant figure it out
    links.append(link)

Solution

  • You don't need to click on icon to get tweet link.

    Twit itself contains that link inside one of a elements.

    Locator for that element is css selector [data-testid=User-Name] a[role=link][href*=status]

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    chrome_options = Options()
    chrome_options.add_argument("--disable-gpu")
    driver = webdriver.Chrome(options=chrome_options)
    driver.get("https://twitter.com/search?q=from%3A%40POTUS&src=typed_query&f=live")
    # login there
    
    links = []
    wait = WebDriverWait(driver, 10)
    hrefs = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '[data-testid=User-Name] a[role=link][href*=status]')))
    for href in hrefs:
        link = href.get_attribute('href')
        links.append(link)
        print(link)