Search code examples
javamultithreadingseleniumwebautomation

What is the right way to implement a "Selenium driver pool" in Java?


I wrote a web automation tool using Selenium WebDriver and the geckodriver in Java. Currently every time I execute the task, a new FirefoxDriver object is instantiated.

Now I want to implement multithreading. The first approach that came to my mind was building something like a fixed size pool - instantiate X FirefoxDriver objects at launch, wrap them in an object with a "inUse" flag and use a singleton to manage these instances.

But is this the proper solution? This is my first Selenium project and the whole concept is new to me. I wasn't able to find an answer to that question myself after several days of googling and reading the documentation now. I would really appreciate your help!


Solution

  • It would be entirely reasonable to create a pool like this. Rather than writing your own pool, i would suggest using an existing generic pool, such as the classic Commons Pool or the more modern spf4j pool.

    For a pool to work, your code must reliably return the driver to the pool after use, or else you will leak drivers (and therefore entire browser instances!).

    Therefore, i would consider a different approach: dedicating a driver to each thread. You can do this using a ThreadLocal:

    private ThreadLocal<WebDriver> drivers = new ThreadLocal<WebDriver>() {
        @Override
        protected WebDriver initialValue() {
            return new FirefoxDriver(); // or whatever
        }
    
        @Override
        public void remove() {
            WebDriver driver = get();
            if (driver != null) driver.close();
            super.remove();
        }
    
        @Override
        public void set(WebDriver value) {
            throw new UnsupportedOperationException();
        }
    };
    

    It is still possible to leak drivers, but only if threads die without removing their value from the ThreadLocal. If you have a fixed pool of threads which you reuse, you'll be fine.

    A disadvantage of this approach is that you end up with a driver for every thread which has ever used a driver, even if not every one of those threads is making use of a driver at the same time. If you have a pool of threads dedicated to using drivers, and doing little else, then this will not be a significant problem. But if you have a pool of threads which do many different kinds of work, it may be.