Search code examples
pythonseleniumselenium-webdriverweb-scrapingselenium-chromedriver

Selenium python function find_elements_by_css_selector() not returning expected data


I am new to Selenium and am trying to scrape data (just names for now) from these bourbon product cards on thewhiskeyexchange.com. I have tested all of my css (and xpath) selectors in scrapy shell so I know that they are correct, but the output returns coded information about the "session" and the element that I do not understand. The quantity of items in the list seem to be correct, so maybe Selenium is doing exactly what it is supposed to do and I just dont know how to convert the output to something I should use. How do I get just the names from the product cards?

I have tried both the driver and the local selector functions Selenium offers with the same results. beautiful soup functions return the data I need, but that method is too inefficient for the scope of the project I am working on. Any insight as to how I can fix this would be greatly appreciated.

IN[]:
chrome_options = Options()
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.binary_location = "C:\Program Files\Google\Chrome\Application\chrome.exe"

IN[]:
driver = webdriver.Chrome(ChromeDriverManager().install())

IN[]:
url = "https://www.thewhiskyexchange.com/c/639/bourbon-whiskey"
driver.get(url)
time.sleep(5) # second delay to improve visual quality
html = driver.page_source
html # HTTP request response object is as expected

IN[]:
els = driver.find_elements_by_css_selector('p.product-card__name')
# local method: els = driver.find_elements(By.CSS_SELECTOR, 'p.product-card__name')
els

OUT[]:
[<selenium.webdriver.remote.webelement.WebElement (session="e521768d8df1dd788b1fda816299b0b5", element="b9384a19-f8c9-46b2-be99-780200dcba99")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e521768d8df1dd788b1fda816299b0b5", element="af76dfa8-b86c-426a-8ad8-30ea904ed11b")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e521768d8df1dd788b1fda816299b0b5", element="58b14e5a-6bc3-443a-807f-ec696e83b096")>, ...

Solution

  • find_elements
    

    returns a list of web element whereas find_element returns a single web element.

    You can iterate over the list and extract the text like it below:

    IN[]:
    els = driver.find_elements(By.CSS_SELECTOR, 'p.product-card__name')
    for e in els:
        print(e.text)
    

    Also, note that find_elements_by_css_selector has been deprecated in newer selenium version (also known as Selenium 4) so one should use find_elements(By.CSS_SELECTOR, "") instead.