Search code examples
pythonselenium-webdriverweb-scrapingxpath

How can I print out all text in a web table column in Python using Selenium?


I am attempting to use a for loop in Python to print out the text in a web table column using the XPath expression of all cells in the column. The XPath expression is similar to this:

//*[@id="webTable"]/tbody/tr[2]/td[6]

The for loop I am using is written like this:

for x in range(totalRows):
    y = driver.find_elements(by = By.XPATH, value = '//*[@id="webTable"]/tbody/tr[' + str(x) + ']/td[6]')
    print(y)

However, when I run the program, this is the output that I get:

[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="c488195e-8751-43c8-9d01-6e873cb2cc4a")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="70f9ad39-4bdd-4bcf-b869-c31968de4492")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="f8fd427e-2bd3-4995-8b24-7cb7bda14f1a")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="0541eb71-24a1-44e9-bb9d-bacc63426bad")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="b19a839e-a6c1-43f2-bcf1-1f0692ff2c0f")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="b427383a-31a5-49f8-a466-62fb5a489047")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="1cd4bd3f-6e7f-4a89-950e-0f5dab47eabd")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="5c964e47-2fff-4c4d-9743-eecbd1c7bea6")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="54ff1ef7-0693-43e2-939e-c387f8f20e06")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="21a63bd7-7dc5-4860-bfb2-1309a842c2f7")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="aee78709-f4ee-4e0f-8cb7-6c3114b52fba")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="28ef515e-4c66-472b-8126-76793eeebee2")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="2fb995ff-9100-4124-9efe-f8c2bfe49767")>]

I tried writing the for loop like this:

for x in range(totalRows):
    y = driver.find_elements(by = By.XPATH, value = '//*[@id="webTable"]/tbody/tr[' + str(x) + ']/td[6]')
    print(y.text)

and:

for x in range(totalRows):
    y = driver.find_elements(by = By.XPATH, value = '//*[@id="webTable"]/tbody/tr[' + str(x) + ']/td[6]').text
    print(y)

but when I write it like that, I receive this error:

AttributeError: 'list' object has no attribute 'text'

How else can I extract the text within the cells?


Solution

  • Here is the solution:

    table = driver.find_element(by=By.XPATH, value='//*[@id="webTable"]/tbody')
    rows = table.find_elements(by=By.TAG_NAME, value="tr")
    
    # Column to choose by its index, say the second column in the table
    desired_column = 1
    desired_column_data = []
    
    for row in rows:
        columns = row.find_elements(by=By.TAG_NAME, value='td')
    
        for index, col in enumerate(columns):
            if index == desired_column:
                desired_column_data.append(col.text)
    
    print(desired_column_data)