Good day,
I am pretty new to Python and Selenium, and need help with the following issue:
A snippet of my code is as follows:
num_page_items = len(date)
blank = "0"
try:
with open('results.csv', 'a') as f:
for i in range(num_page_items):
f.write(name[i].text + "#" + surname[i].text + "#" + ref[i].text + "#" + url[i].text + "\n")
except IndexError:
with open('results.csv', 'a') as f:
f.write(blank)
I have a few variables that are scraping a website using selenium. An example of the data and expected output as follows:
Name: Joe Surname: Soap Ref: 1234 URL: www.example.com
Name: Bill Surname: Smith Ref: 4567 URL: www.dot.com
when all elements are present the Python script works well, however when one element (in the example: Ref doesn't exist in the second entry) doesn't exist the output is as follows
output when an element doesn't exist
what can I do to set the variable to "Null" if the variable doesn't exist on the webpage so the expected new output would be as follows:
expected output when element doesn't exist
Just as a side note, the error I receive isn't a Selenium exception, but is an IndexError, hence the use of the "IndexError" except statement
EDIT - Felipe Gutierrez's Suggestion
larger piece of the code with Felipe's suggestion:
for url in links:
driver.get(url) #goes to the array and opens each link
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
ref = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[3]""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url
num_page_items = len(date)
blank = "blank"
for ref in ref:
if ref is None:
ref = 0
with open('results.csv', 'a') as f:
for i in range(num_page_items):
f.write(company[i].text + "#" + date[i].text + "#" + ref[i].text + "#" + title[i].text + "#" + urlinf + "\n")
driver.close()
I now get the following error:
Traceback (most recent call last): File "accc_for_loop_nest.py", line 50, in f.write(company[i].text + "#" + date[i].text + "#" + ref[i].text + "#" + title[i].text + "#" + urlinf + "\n") TypeError: 'WebElement' object does not support indexing
You loose the index of the lists you are iterating on with the try-catch, you can try testing for the IndexError values before the insertion loop and assign a zero to the list at that specific place. Than do the insertion without the exception handling. Something like:
for url in links:
driver.get(url) #goes to the array and opens each link
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
ref = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[3]""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url
num_page_items = len(date)
blank = "blank"
companyStrings = []
dateStrings = []
refStrings = []
titleStrings = []
with open('results.csv', 'a') as f:
for i in range(num_page_items):
companyStrings.append( company[i].text )
dateStrings.append( date[i].text )
refStrings.append( ref[i].text )
titleStrings.append( title[i].text )
if companyStrings[i] == '':
companyStrings[i] = '0'
if dateStrings[i] = '':
dateStrings[i] = '0'
if refStrings[i] == '':
refStrings[i] = '0'
if titleStrings[i] == '':
titleStrings[i] = '0'
f.write(companyStrings[i] + "#" + dateStrings[i] + "#" + refStrings[i] + "#" + titleStrings[i] + "#" + urlinf + "\n")
driver.close()