python html selenium-webdriver web-scraping

How to scrape data from arbitrary number of row listings using python selenium?

So I'm trying to create a bot that identifies nft loan listings on blur that meet certain criteria such as the loans total value being 80% or less than its floor price or APY being greater than 100%. I've figured out the basics of loading up chrome using selenium and navigating to the correct section of the website to view a collections loans. But I'm struggling to actually extract the data from the table of loans. What id like to be able to do is extract the table of loan listings into an array of arrays or array of dictionaries, with each array/dictionary containing data representing each name, status, borrow amount, LTV, and APY.

What I have working thus far:

import selenium
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time

path = "/Users/#########/Desktop/chromedriver-mac-arm64/chromedriver"

# Create an instance of ChromeOptions
options = Options()
options.add_experimental_option("detach", True)
options.add_argument("disable-infobars");

# Specify the path to the ChromeDriver
service = Service(path)

# Initialize the WebDriver with the service and options
driver = webdriver.Chrome(service=service, options=options)

# Open Blur beanz collection and navigating to active loans page
driver.maximize_window
driver.get("https://blur.io/eth/collection/beanzofficial/loans")
time.sleep(3)
loan_button = driver.find_element(By.XPATH, "/html/body/div/div/main/div/div[3]/div/div[2]/div[1]/div[1]/nav/button[2]")
loan_button.click()

I'm honestly new to selenium, so I've just been toying around with my intuition and chatgpt trying to solve this. The best guess I've had so far was the following bit of code that tried to extract the APY of all the loans. This did not work, as im sure there was some faulty intuition.

elements = driver.find_elements(By.CSS_SELECTOR, 'Text-sc-m23s7f-0 hAGCAO')


# Initialize an empty list to store the percentage values
percentages = []

# Iterate through each element and extract its text (which contains the percentage)
for element in elements:
    percentage = element.text
    percentages.append(percentage)

# Print the extracted percentage values
print(percentages)

time.sleep(10)
# Close the WebDriver
driver.quit()

I also feel like this is a bit complex, having to extract each column in the table rather than each row at a time. Not sure if there is a simpler way to do this, if there was that would be great. If not ok too!

Solution

I would recommend searching "how to locate elements with Selenium" and doing some reading. But maybe this get you started...

Your XPATH to select the "ALL LOANS" button is /html/body/div/div/main/div/div[3]/div/div[2]/div[1]/div[1]/nav/button[2]--it's clear you got this by clicking "Copy XPATH" in developer console. This is generally not a good approach, because if anything about the page structure changes your code will break (imagine a developer decides to add or remove a div anywhere in that hierarchy). Instead, try to find a unique way to select the element that is unlikely to change. Selecting by ID (eg: driver.find_element(By.ID, "main-menu")) is preferred if possible. On this page, the "ALL LOANS" button doesn't have an ID nor a unique class name--in that case, I prefer using text to locate the element. My recommended XPATH would be: //button[.='All Loans']. This XPATH means, select all <button> elements anywhere on the page whose string content is equal to "All Loans". (Note that this text matching IS case sensitive--even though the text "ALL LOANS" displays uppercase on the page, if you examine the HTML code you can see the capitalization of the actual text).

To test an XPATH in Chrome, open the HTML viewer (Elements tab of developer console) and hit CTRL+F to open the Find searchbar, then enter your XPATH (without enclosing quotes). You will see how many elements are located using the XPATH.

For selecting the loans, you can use this XPATH: //div[@id= 'COLLECTION_MAIN']//div[@role='rowgroup']//div[@role='row']. First it locates the <div> with an id of 'COLLECTION_MAIN'--this is the center area of the screen. Then it locates the "rowgroup" div descendant element which contains the Loans, and finally all the rows themselves. You can try playing around with this XPATH--if you remove either of the first components of the selector it will not work, because it will locate additional divs with the 'row' role.

You can then iterate over these rows to get any details you want.

Putting it together:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

...

options = Options()
options.add_experimental_option("detach", True)
# disable the message "Chrome being controlled by automated test software"
options.add_experimental_option("excludeSwitches",["enable-automation"])

driver = webdriver.Chrome(service=service, options=options)

# Open Blur beanz collection and navigating to active loans page
driver.maximize_window()  # fixed typo
driver.get("https://blur.io/eth/collection/beanzofficial/loans")
# wait until element is clickable then click it
loans_button = WebDriverWait(driver, 20).until(
    EC.element_to_be_clickable((By.XPATH, "//button[.='All Loans']")))
loans_button.click()

percentages = []

# wait for table to load
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.XPATH, "//div[@id= 'COLLECTION_MAIN']//div[@role='rowgroup']//div[@role='row']")))

for loan_row in driver.find_elements(By.XPATH, "//div[@id= 'COLLECTION_MAIN']//div[@role='rowgroup']//div[@role='row']"):
   apy = loan_row.find_element(By.XPATH, "div[5]").text # select the 5th column
   percentages.append(apy) # could use float(apy[:-1]) to convert to number

Edit: Andrej's answer is far superior in terms of ease and speed--using Selenium for data scraping should really be a last resort