selenium-webdriver web-scraping xpath css-selectors webdriverwait

Capture all data in tr tag within binance.com using Python Selenium

I am unable to read all the data in the tbody tag on the Binance futures page using python selenium. I try to scrape this link: https://www.binance.com/en/futures-activity/leaderboard/user/um?encryptedUid=14507FCBFF9FBE584EDDEC628C4593B8

I used to command below:

tr = driver.find_elements(By.TAG_NAME,'tbody')

but there is no text output.

I'm trying to get all the data in the tr tags under the tbody tag in an array or an list object. I also need to know how many tr tag in the link.

Solution

For your task, you can use selenium + BeautifulSoup. Open the page in selenium, wait for the page to load, and then use the received data as a 'soup' object. First we find 'tbody', then we search for all 'tr' and for each 'tr' we find all 'td'. We extract the data and write it to the list. The first element is 'Symbol', the second is the total number of 'td' elements in the section, and then all the data from the table. Code:

from bs4 import BeautifulSoup
from selenium import webdriver
import time
from selenium.webdriver.chrome.options import Options

url = f'https://www.binance.com/en/futures-activity/leaderboard/user/um?encryptedUid=14507FCBFF9FBE584EDDEC628C4593B8'

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/93.0.4577.82 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,"
              "application/signed-exchange;v=b3;q=0.9",
}


def get_result(url, headers):
    chrome_options = Options()
    options = webdriver.ChromeOptions()
    options.add_argument('headless')
    options.add_argument('--no-sandbox')
    driver = webdriver.Chrome(options=chrome_options, executable_path=".../chromedriver_linux64/chromedriver") # the path to your chromedriver
    driver.get(url)
    time.sleep(10)
    html = driver.page_source
    soup = BeautifulSoup(html, "lxml")
    tbody = soup.find('tbody', class_='bn-table-tbody')
    trs = tbody.find_all('tr')
    data = list()
    for tr in trs:
        tr_key=tr.get('data-row-key')
        if tr_key is None:
            pass
        else:
            mid_data = list()
            count=0
            mid_data.append(f'Symbol - {tr_key}')
            tds = tr.find_all('td')
            mid_data.append(f'td_count - {len(tds)}')
            for td in tds:
                count+=1
                mid_data.append(f'td_{count} - {td.text}')
            print(mid_data)
    

def main():
    get_result(url=url, headers=headers)


if __name__ == "__main__":
    main()

Will return:

['Symbol - SOLUSDT', 'td_count - 7', 'td_1 - SOLUSDT Perpetual Short20x', 'td_2 - 7036', 'td_3 - 21.9520', 'td_4 - 20.4050', 'td_5 - 10,884.64\xa0(151.6288%)', 'td_6 - 2023-03-03 17:01:49', 'td_7 - Trade']
['Symbol - ETHUSDT', 'td_count - 7', 'td_1 - ETHUSDT Perpetual Short30x', 'td_2 - 385.383', 'td_3 - 1,562.54', 'td_4 - 1,564.50', 'td_5 - -754.66\xa0(-3.7549%)', 'td_6 - 2023-03-05 01:33:30', 'td_7 - Trade']
['Symbol - EOSUSDT', 'td_count - 7', 'td_1 - EOSUSDT Perpetual Short20x', 'td_2 - 138526.5', 'td_3 - 1.272', 'td_4 - 1.175', 'td_5 - 13,383.85\xa0(164.4547%)', 'td_6 - 2023-03-04 02:42:13', 'td_7 - Trade']
['Symbol - COCOSUSDT', 'td_count - 7', 'td_1 - COCOSUSDT Perpetual Short10x', 'td_2 - 33878.5', 'td_3 - 2.263120', 'td_4 - 1.534000', 'td_5 - 24,701.49\xa0(475.3063%)', 'td_6 - 2023-03-03 04:20:52', 'td_7 - Trade']
['Symbol - SSVUSDT', 'td_count - 7', 'td_1 - SSVUSDT Perpetual Short10x', 'td_2 - 1010.3', 'td_3 - 44.252249', 'td_4 - 38.808423', 'td_5 - 5,499.90\xa0(140.2743%)', 'td_6 - 2023-03-03 17:35:15', 'td_7 - Trade']

You can process the final data as you like.