Search code examples
selenium-webdriverweb-scrapingxpathcss-selectorswebdriverwait

Capture all data in tr tag within binance.com using Python Selenium


I am unable to read all the data in the tbody tag on the Binance futures page using python selenium. I try to scrape this link: https://www.binance.com/en/futures-activity/leaderboard/user/um?encryptedUid=14507FCBFF9FBE584EDDEC628C4593B8

I used to command below:

tr = driver.find_elements(By.TAG_NAME,'tbody')

but there is no text output.

I'm trying to get all the data in the tr tags under the tbody tag in an array or an list object. I also need to know how many tr tag in the link.


Solution

  • For your task, you can use selenium + BeautifulSoup. Open the page in selenium, wait for the page to load, and then use the received data as a 'soup' object. First we find 'tbody', then we search for all 'tr' and for each 'tr' we find all 'td'. We extract the data and write it to the list. The first element is 'Symbol', the second is the total number of 'td' elements in the section, and then all the data from the table. Code:

    from bs4 import BeautifulSoup
    from selenium import webdriver
    import time
    from selenium.webdriver.chrome.options import Options
    
    url = f'https://www.binance.com/en/futures-activity/leaderboard/user/um?encryptedUid=14507FCBFF9FBE584EDDEC628C4593B8'
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
                      "Chrome/93.0.4577.82 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,"
                  "application/signed-exchange;v=b3;q=0.9",
    }
    
    
    def get_result(url, headers):
        chrome_options = Options()
        options = webdriver.ChromeOptions()
        options.add_argument('headless')
        options.add_argument('--no-sandbox')
        driver = webdriver.Chrome(options=chrome_options, executable_path=".../chromedriver_linux64/chromedriver") # the path to your chromedriver
        driver.get(url)
        time.sleep(10)
        html = driver.page_source
        soup = BeautifulSoup(html, "lxml")
        tbody = soup.find('tbody', class_='bn-table-tbody')
        trs = tbody.find_all('tr')
        data = list()
        for tr in trs:
            tr_key=tr.get('data-row-key')
            if tr_key is None:
                pass
            else:
                mid_data = list()
                count=0
                mid_data.append(f'Symbol - {tr_key}')
                tds = tr.find_all('td')
                mid_data.append(f'td_count - {len(tds)}')
                for td in tds:
                    count+=1
                    mid_data.append(f'td_{count} - {td.text}')
                print(mid_data)
        
    
    def main():
        get_result(url=url, headers=headers)
    
    
    if __name__ == "__main__":
        main()
    

    Will return:

    ['Symbol - SOLUSDT', 'td_count - 7', 'td_1 - SOLUSDT Perpetual Short20x', 'td_2 - 7036', 'td_3 - 21.9520', 'td_4 - 20.4050', 'td_5 - 10,884.64\xa0(151.6288%)', 'td_6 - 2023-03-03 17:01:49', 'td_7 - Trade']
    ['Symbol - ETHUSDT', 'td_count - 7', 'td_1 - ETHUSDT Perpetual Short30x', 'td_2 - 385.383', 'td_3 - 1,562.54', 'td_4 - 1,564.50', 'td_5 - -754.66\xa0(-3.7549%)', 'td_6 - 2023-03-05 01:33:30', 'td_7 - Trade']
    ['Symbol - EOSUSDT', 'td_count - 7', 'td_1 - EOSUSDT Perpetual Short20x', 'td_2 - 138526.5', 'td_3 - 1.272', 'td_4 - 1.175', 'td_5 - 13,383.85\xa0(164.4547%)', 'td_6 - 2023-03-04 02:42:13', 'td_7 - Trade']
    ['Symbol - COCOSUSDT', 'td_count - 7', 'td_1 - COCOSUSDT Perpetual Short10x', 'td_2 - 33878.5', 'td_3 - 2.263120', 'td_4 - 1.534000', 'td_5 - 24,701.49\xa0(475.3063%)', 'td_6 - 2023-03-03 04:20:52', 'td_7 - Trade']
    ['Symbol - SSVUSDT', 'td_count - 7', 'td_1 - SSVUSDT Perpetual Short10x', 'td_2 - 1010.3', 'td_3 - 44.252249', 'td_4 - 38.808423', 'td_5 - 5,499.90\xa0(140.2743%)', 'td_6 - 2023-03-03 17:35:15', 'td_7 - Trade']
    

    You can process the final data as you like.