Search code examples
amazon-web-servicesaws-lambdaselenium-chromedriver

AWS Lambda download a file using Chromedriver


I have a container that is built to run selenium-chromedriver with python to download an excel(.xlsx) file from a website.

I am Using SAM to build & deploy this image to be run in AWS Lambda.

When I build the container and invoke it locally, the program executes as expected: The download occurs and I can see the file placed in the root directory of the container.

The problem is: when I deploy this image to AWS and invoke my lambda function I get no errors, however, my download is never executed. The file never appears in my root directory.

My first thought was that maybe I didn't allocate enough memory to the lambda instance. I gave it 512 MB, and the logs said it was using 416MB. Maybe there wasn't enough room to fit another file inside? So I have increased the memory provided to 1024 MB, but still no luck.

My next thought was that maybe the download was just taking a long time, so I also allowed the program to wait for 5 minutes after clicking the download to ensure that the download is given time to complete. Still no luck.

I have also tried setting the following options for chromedriver (full list of chromedriver options posted at bottom):

options.add_argument(f"--user-data-dir={'/tmp'}"),
options.add_argument(f"--data-path={'/tmp'}"), 
options.add_argument(f"--disk-cache-dir={'/tmp'}")

and also setting tempfolder = mkdtemp() and passing that into the chrome options as above in place of /tmp. Still no luck.

Since this applicaton is in a container, it should run the same locally as it does on AWS. So I am wondering if it is part of the config outside of the container that is blocking my ability to download a file? Maybe the request is going out but the response is not being allowed back in?

Please let me know if there is anything I need to clarify -- Any help on this issue is greatly appreciated!

Full list of Chromedriver options

        options.binary_location = '/opt/chrome/chrome'
        options.headless = True
        options.add_argument('--disable-extensions')
        options.add_argument('--no-first-run')
        options.add_argument('--ignore-certificate-errors')
        options.add_argument('--disable-client-side-phishing-detection')
        options.add_argument('--allow-running-insecure-content')
        options.add_argument('--disable-web-security')
        options.add_argument('--lang=' + random.choice(language_list))
        options.add_argument('--user-agent=' + fake_user_agent.user_agent())
        options.add_argument('--no-sandbox')
        options.add_argument("--window-size=1920x1080")
        options.add_argument("--single-process")
        options.add_argument("--disable-dev-shm-usage")
        options.add_argument("--disable-dev-tools")
        options.add_argument("--no-zygote")
        options.add_argument(f"--user-data-dir={'/tmp'}")
        options.add_argument(f"--data-path={'/tmp'}")
        options.add_argument(f"--disk-cache-dir={'/tmp'}")
        options.add_argument("--remote-debugging-port=9222")
        options.add_argument("start-maximized")
        options.add_argument("enable-automation")
        options.add_argument("--headless")
        options.add_argument("--disable-browser-side-navigation")
        options.add_argument("--disable-gpu")

        driver = webdriver.Chrome("/opt/chromedriver", options=options)```

Solution

  • Just in case anybody stumbles across this queston in future, adding the following to chrome options solved my issue:

    prefs = {
        "profile.default_content_settings.popups": 0,
        "download.default_directory": r"/tmp",
        "directory_upgrade": True
        }
    options.add_experimental_option("prefs", prefs)