Search code examples
python-3.xselenium-webdriverdecoratorcontextmanagerpyvirtualdisplay

Proper way to create a wrap context manager into a decorator in python?


I have several webpages that i would like to scrape using selenium. I want to automate this and run it on a remote machine. Since each website is different, the script would require different functionalities to complete the job. Instead of having each script having the same code to start a virutal display and a webdriver, i have a rough idea of using a decorator that can start up a virtual display and webdriver like so:

    def open_headless_browser(func: Callable) -> Callable:
        disp = Display(visible=False, size=(100, 100))
        options = webdriver.ChromeOptions()
        options.add_argument("--headless=new")
        options.add_argument("--dns-prefetch-disable")
        def start(): -> None
            with disp as display:
                with webdriver.Chrome(options=self.options) as wd:
                    func()
        return start

And then i can potentially have my scripts (the one that will actually perform the scraping) like so:

@open_headless_browser
def scrape_abc(url_abc: str) -> None:
    driver.get(url_abc)
    driver.find_elements_by_xpath('abc')

@open_headless_browser
def scrape_xyz(url_xyz: str) -> None:
    driver.get(url_xyz)
    driver.find_elements_by_css('xyz')

However, several things concerning me:

  • is the code in my scrape_abc and scrape_xzy functions cinsidered a bit awkward because it doesn not have any idea of what driver is (since it is defined in the decorator).
  • would this even work? Am i over-complicating things or am i just approaching this idea incorrectly?
  • is this pythonic

i am on python3.10 selenium4.15 pyvirtualdisplay3.0

EDIT: after some thinking, this approach will not work after all. The decorated functions will not have access to the webdriver object defined in the decorator


Solution

  • EDIT: after some thinking, this approach will not work after all. The decorated functions will not have access to the webdriver object defined in the decorator

    Sure it will, you just need to pass wd as an argument to the function, something like this:

    def open_headless_browser(func: Callable) -> Callable:
        disp = Display(visible=False, size=(100, 100))
        options = webdriver.ChromeOptions()
        options.add_argument("--headless=new")
        options.add_argument("--dns-prefetch-disable")
        def start(): -> None
            with disp as display:
                with webdriver.Chrome(options=options) as wd:
                    func(wd)
        return start
    

    Then your functions will look like:

    @open_headless_browser
    def scrape_abc(driver: webdriver.Chrome) -> None:
        driver.get(url_abc)
        driver.find_elements_by_xpath('abc')
    
    @open_headless_browser
    def scrape_abc(driver: webdriver.Chrome) -> None:
        driver.get(url_xyz)
        driver.find_elements_by_xpath('xyz')
    

    If you want to be able to pass in a URL, you need to define arguments in the wrapper function, too:

    def open_headless_browser(func: Callable) -> Callable:
        disp = Display(visible=False, size=(100, 100))
        options = webdriver.ChromeOptions()
        options.add_argument("--headless=new")
        options.add_argument("--dns-prefetch-disable")
        def start(url: str): -> None
            with disp as display:
                with webdriver.Chrome(options=options) as wd:
                    func(wd, url)
        return start
    
    @open_headless_browser
    def scrape_abc(driver: webdriver.Chrome, url: str) -> None:
        driver.get(url)
        driver.find_elements_by_xpath('abc')
    

    Then it's just a case of remembering that although you define the function as having two arguments, you only call it with one.