Search code examples
pythonseleniumrestparallel-processingjoblib

How to implement parallel selenium processing


I had this question asked earlier [inactive and now deleted] but didn't phrase it right. And I'm trying to improve on that.

Any help would be greatly appreciated.

WHAT I'M TRYING TO DO: Automate certain tasks [with selenium] (Restful API)

  • go to website
  • search for task and if found, open task in a new tab (switch to new tab - tab1)
  • click on 'seen' and then close the tab (close - switch to tab0) [this doesn't return anything]
  • refresh the first tab (tab0) so 'seen' automatically updates (I think this works by session/cache)- there's a side column that shows all seen tasks.
  • And at the end of all tasks (tab0), click on 'done' under completed tasks which returns a booking code.

WHAT I HAVE RUNNING:

class SomeClass(SomeOtherClass):
  def do_tasks(self, selections):
       booking_code = None
       task_done = 0

       driver = self.connect() #spawns a chrome browser

       #I want the below for loop to run in parallel

       for task in tasks:
           try:
               #check_if_task_is_in_search_result_&_then_open_in_new_tab
               #do_something
               task_done += 1
               #close_tab
           except:
               #handle_something

           driver.close()
           driver.switch_to.window(driver.window_handles[0])
           driver.refresh()

       try:
           check = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, 'xxxx'))).click()
       except NoSuchElementException as e:
           log_error(str(e))
       except TimeoutException as e:
           log_error(str(e))
           
       else:
           booking_code = str(driver.find_element_by_class_name("number").text).split(':')[1]
           driver.quit()

       return task_done, booking_code

This is sequential and takes roughly 5 mins for 5 tasks.

GETTING IT TO RUN IN PARALLEL

WHAT I'VE TRIED SO FAR - bring out the for-loop section to a new method - do_task. Import: from joblib import Parallel, delayed

class SomeClass(SomeOtherClass):
  def do_task(self, task):
      driver = self.connect() #spawns a chrome browser
      try:
         #do_something
         task_done += 1
      except:
         #handle_something
      
      return task_done, driver 

 

  def get_booking_code(self, driver):
      try:
           check = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, 'xxxx'))).click()
      except NoSuchElementException as e:
           log_error(str(e))
      else:
           booking_code = str(driver.find_element_by_class_name("number").text).split(':')[1]
           driver.quit()
      return booking_code


if __name__ == '__main__':
    tasks = [
        ['task1'],
        ['task2']
  ]
  
    b = SomeClass(site='https://somesite.com/') #chrome connects to this via the self.connect()
    completed_tasks, driver = Parallel(n_jobs=-1)(delayed(b.do_task)(task) for task in tasks)
    booking_code = b.get_booking_code(driver)
    print(completed_tasks, booking_code)

It doesn't run. It spawns a blank chrome browser and closes immediately.

Traceback as below:

  completed_tasks, driver = Parallel(n_jobs=-1)(delayed(b.do_task)(task) for task in tasks)  
  File "--\env\lib\site-packages\joblib\parallel.py", line 1054, in __call__
    self.retrieve()
  File "--\env\lib\site-packages\joblib\parallel.py", line 933, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "--\env\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "--\python\python38\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "c:\users\okwud\appdata\local\programs\python\python38\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
selenium.common.exceptions.SessionNotCreatedException: Message: session not created       
from disconnected: unable to connect to renderer
  (Session info: chrome=91.0.4472.114)


Solution

  • I had this solved yesterday(tasks running in parallel) but I'm now facing an entirely new challenge for which a new post would be made to address it.

    How I got the tasks to run in parallel (using my second code - 'what I have tried so far'):

    
    import multiprocessing
    
    #code excluded on purpose
    
    if __name__ == '__main__':
        tasks = [
            ['task1'],
            ['task2']
      ]
        b = SomeClass()
        with multiprocessing.Pool(processes=2) as p:
            p.map(b.do_task, tasks)
    
    

    The do_task method only returns the number of task done (this is an arbitary return value)