Search code examples
pythonseleniumweb-scrapingindex-error

While Python is writing to CSV, the script is inserting new line in try / except block in the csv file


Good day,

I am pretty new to Python and Selenium, and need help with the following issue:

A snippet of my code is as follows:

num_page_items = len(date)
blank = "0"
try:
    with open('results.csv', 'a') as f:
        for i in range(num_page_items):
            f.write(name[i].text + "#" + surname[i].text + "#" + ref[i].text + "#" + url[i].text + "\n")
except IndexError:
    with open('results.csv', 'a') as f:
            f.write(blank)

I have a few variables that are scraping a website using selenium. An example of the data and expected output as follows:

Name: Joe Surname: Soap Ref: 1234 URL: www.example.com

Name: Bill Surname: Smith Ref: 4567 URL: www.dot.com

expected output

when all elements are present the Python script works well, however when one element (in the example: Ref doesn't exist in the second entry) doesn't exist the output is as follows

output when an element doesn't exist

what can I do to set the variable to "Null" if the variable doesn't exist on the webpage so the expected new output would be as follows:

expected output when element doesn't exist

Just as a side note, the error I receive isn't a Selenium exception, but is an IndexError, hence the use of the "IndexError" except statement

EDIT - Felipe Gutierrez's Suggestion

larger piece of the code with Felipe's suggestion:

for url in links:
        driver.get(url) #goes to the array and opens each link

        company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""") 
        date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
        ref = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[3]""")
        title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
        urlinf = driver.current_url

        num_page_items = len(date)
        blank = "blank"

        for ref in ref:
            if ref is None:
                ref = 0

        with open('results.csv', 'a') as f:
            for i in range(num_page_items):
                f.write(company[i].text + "#" + date[i].text + "#" + ref[i].text + "#" + title[i].text + "#" + urlinf + "\n")

driver.close()

I now get the following error:

Traceback (most recent call last): File "accc_for_loop_nest.py", line 50, in f.write(company[i].text + "#" + date[i].text + "#" + ref[i].text + "#" + title[i].text + "#" + urlinf + "\n") TypeError: 'WebElement' object does not support indexing


Solution

  • You loose the index of the lists you are iterating on with the try-catch, you can try testing for the IndexError values before the insertion loop and assign a zero to the list at that specific place. Than do the insertion without the exception handling. Something like:

    for url in links:
        driver.get(url) #goes to the array and opens each link
    
        company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""") 
        date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
        ref = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[3]""")
        title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
        urlinf = driver.current_url
    
        num_page_items = len(date)
        blank = "blank"
    
        companyStrings = []
        dateStrings = []
        refStrings = []
        titleStrings = []
    
        with open('results.csv', 'a') as f:
                for i in range(num_page_items):
                    companyStrings.append( company[i].text )
                    dateStrings.append( date[i].text )
                    refStrings.append( ref[i].text )
                    titleStrings.append( title[i].text ) 
                    if companyStrings[i] == '':
                        companyStrings[i] = '0'
                    if dateStrings[i] = '':
                        dateStrings[i] = '0'
                    if refStrings[i] == '':
                        refStrings[i] = '0'
                    if titleStrings[i] == '':
                        titleStrings[i] = '0'
                    f.write(companyStrings[i] + "#" + dateStrings[i] + "#" + refStrings[i] + "#" + titleStrings[i] + "#" + urlinf + "\n")
    
    driver.close()