Search code examples
pythonlistseleniumgetinvalidargumentexception

selenium.common.exceptions.InvalidArgumentException: Message: invalid argument while iterating through a list of urls and passing as argument to get()


I am scraping a page to get the URLs and then use them to scrape a bunch of info. I'd like to avoid copying and pasting all the time but I cannot find how to make get() to work with the object. The first part of my code works perfectly well but when I get to the part that tries to get the url I get the following error message:

Traceback (most recent call last):
  File "/Users/rcastong/Desktop/imgs/try-creating-object-url.py", line 61, in <module>
    driver4.get(urlworks2) 
  File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
  (Session info: chrome=98.0.4758.109)

Here is part of the code

  #this part works well    
    for number, item in enumerate(imgs2, 1):
            # print('---', number, '---')
        
            img_url = item.get_attribute("href")
            if not img_url:
                print("none")
            else:
                print('"'+img_url+'",')
        
  # the error happens on driver4.get(urlworks2)        
        for i in range(0,30):
            urlworks = img_url[i]
            urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
            driver4 = webdriver.Chrome()
            driver4.get(urlworks2) 
            def check_exists_by_xpath(xpath):
                try:
                    WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, xpath)))
                except TimeoutException:
                    return False
                return True
            
            imgsrc2 = WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, "//p[@data-testid='artistName']/ancestor::a[contains(@class,'ChildrenLink')]")))                                                                                                                 
            for number, item in enumerate(imgsrc2, 1):
                # print('---', number, '---')
                artisturls = item.get_attribute("href")
                if not artisturls:
                    print("none")
                else:
                    print('"'+artisturls+'",')

Solution

  • This error message...

    Traceback (most recent call last):
      .
        driver4.get(urlworks2) 
      .
        self.execute(Command.GET, {'url': url})
      .
        self.error_handler.check_response(response)
      .
    selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
      (Session info: chrome=98.0.4758.109)
    

    ...implies that the url passed as an argument to get() was an argument was invalid.


    Deep Dive

    With in the first for loop item.get_attribute("href") returns a url string and img_url gets updated at every iteration. So practically img_url remains a string but not a list of url as you assumed. As a result, in the second for loop when you try to iterate over the elements of a string and pass them to get() you see the error InvalidArgumentException: Message: invalid argument.


    Demonstartion

    As an example the below line of code:

    img_url = 'https://www.google.com/'
    for i in range(0,5):
        urlworks = img_url[i]
        urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
        print(urlworks2)
    

    prints:

    h
    t
    t
    p
    s
    

    Solution

    Declare a empty list img_url within the global scope and keep on appending the hrefs to the list, so you can iterate the list later.

    img_url = []
    for number, item in enumerate(imgs2, 1):
        img_url.append(item.get_attribute("href"))
    

    Reference

    You can find a couple of relevant detailed discussions in: