Search code examples
pythonseleniumweb-scrapingxpathlist-comprehension

Find element by Xpath. How to split the element I don't want inside the Xpath


I try to scrape a website using Selenium. I got a problem when I try to get the coins name. because there're 2 elements inside 'td' How can I get rid of another element I don't want. or keep track to only its first element. (I found this post but I'm not sure if it answer my issue or not)

This is my whole code

#driver chrome def
website = 'https://www.bitkub.com/fee/cryptocurrency'
path = r"C:\\Users\\USER\\Downloads\\chromedriver.exe"
options = Options()
options.add_argument("start-maximized")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get(website)

#giving variable
coin_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tbody//tr//td[2]//span")))]
chain_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tbody//tr//td[3]//div")))]
withdrawal_fees = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tbody//tr//td[4]//div")))]
#print(coin_name)
#print(chain_name)
#print(withdrawal_fees)


#for loop make list

for coin, chains, wdf in zip(coin_name, chain_name, withdrawal_fees):
    print("Coin name: {} Chain: {} Fee: {}".format(coin, chains, wdf))

The input of coin_name (which I mentioned that it got 2 elements)

['Civic(CVC)', '(CVC)', 'Bitcoin SV(BSV)', '(BSV)', 'Ethereum(ETH)', '(ETH)', 'Bitkub Coin(KUB)', '(KUB)', 'Compound(COMP)', '(COMP)', 'Curve DAO Token(CRV)', '(CRV)', .... ]

This is how element on the website look like enter image description here

I wanted input to look like this so I can make dataframe out of it

['Civic(CVC)', 'Bitcoin SV(BSV)', 'Ethereum(ETH)', 'Bitkub Coin(KUB)', 'Compound(COMP)', 'Curve DAO Token(CRV)', .... ]

Solution

  • As per your current output:

    ['Civic(CVC)', '(CVC)', 'Bitcoin SV(BSV)', '(BSV)', 'Ethereum(ETH)', '(ETH)', 'Bitkub Coin(KUB)', '(KUB)', 'Compound(COMP)', '(COMP)', 'Curve DAO Token(CRV)', '(CRV)', .... ]
    

    You can skip every alternative element and create a new list using list comprehension as follows:

    # coin_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tbody//tr//td[2]//span")))]
    coin_name = ['Civic(CVC)', '(CVC)', 'Bitcoin SV(BSV)', '(BSV)', 'Ethereum(ETH)', '(ETH)', 'Bitkub Coin(KUB)', '(KUB)', 'Compound(COMP)', '(COMP)', 'Curve DAO Token(CRV)', '(CRV)']
    res = [coin_name[i] for i in range(len(coin_name)) if i % 2 == 0]
    print (res)
    

    Console Output:

    ['Civic(CVC)', 'Bitcoin SV(BSV)', 'Ethereum(ETH)', 'Bitkub Coin(KUB)', 'Compound(COMP)', 'Curve DAO Token(CRV)']