Search code examples
pythonseleniumweb-scrapingselenium-chromedriverscreen-scraping

'Suspected' Odd behavior from output of Selenium using Python


Good Day, I am running the following snippet and find the following behavior which I am not sure if it is correct or not:

for url in links:
        driver.get(url)
        date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
        secref1 = driver.find_elements_by_xpath("""/html/body/div[3]/div/section/div[2]/div/section/div/section/div/article/div[1]/div[3]/div[2]/div""")
        secref2 = driver.find_elements_by_xpath("""/html/body/div[3]/div/section/div[2]/div/section/div/section/div/article/div[1]/div[4]/div[2]/div""")

        if not secref2:
            secref2.append("Null")
        else:
            secref2 = secref2

        num_page_items = len(date)

        for i in range(num_page_items):
            print secref2

driver.close()

I expect "secref2" to be missing from the webpage, hence the IF/ELSE.

my output is as follows when running the script:

DevTools listening on ws://127.0.0.1:64592/devtools/browser/da7ab0e6-e0e9-4edb-963a-913b38c6f4dd
['Null']
[<selenium.webdriver.remote.webelement.WebElement (session="a7bc63bef087357d1510c3b28ec8db87", element="0.14518628426304736-4")>]
[<selenium.webdriver.remote.webelement.WebElement (session="a7bc63bef087357d1510c3b28ec8db87", element="0.6063690703515521-4")>]
[<selenium.webdriver.remote.webelement.WebElement (session="a7bc63bef087357d1510c3b28ec8db87", element="0.16122194044687665-7")>]
[<selenium.webdriver.remote.webelement.WebElement (session="a7bc63bef087357d1510c3b28ec8db87", element="0.7547639796767653-4")>]
[<selenium.webdriver.remote.webelement.WebElement (session="a7bc63bef087357d1510c3b28ec8db87", element="0.768240568661338-16")>]
[<selenium.webdriver.remote.webelement.WebElement (session="a7bc63bef087357d1510c3b28ec8db87", element="0.3077014556092601-4")>]
[<selenium.webdriver.remote.webelement.WebElement (session="a7bc63bef087357d1510c3b28ec8db87", element="0.9689075758046188-4")>]
[<selenium.webdriver.remote.webelement.WebElement (session="a7bc63bef087357d1510c3b28ec8db87", element="0.09545508090332766-4")>]
[<selenium.webdriver.remote.webelement.WebElement (session="a7bc63bef087357d1510c3b28ec8db87", element="0.068763767350847-4")>]

I see the first "Null" however subsequent entries look to be some sort of output.

If I try:

        for i in range(num_page_items):
        print secref2[i].text

I get the following error:

DevTools listening on ws://127.0.0.1:64788/devtools/browser/df696310-30cf-4833-89fa-fac28e6b3bb0
Traceback (most recent call last):
  File "test.py", line 54, in <module>
    print secref2[i].text
AttributeError: 'str' object has no attribute 'text'

Any help with this would be appreciated.


Solution

  • You're iterating twice. So in the first URL, you get Null. In subsequent URLs you get a list of webelements. You can't print secref2[i].text because the first time you hit it, it is "Null" and "Null" is a str.

    Did you mean to assign something else to the variable here? I don't know why you'd assign the variable to itself.

    else:
        secref2 = secref2