I wrote my code and uploaded it to AWS Lambda succesfully via AWS SAM CLI. It basically goes into the URL I gave, and prints the title of the website. A very beginner level code. Below is my code:
import os, shutil, uuid, time
from selenium import webdriver
def setup():
BIN_DIR = "/tmp/bin"
if not os.path.exists(BIN_DIR):
print("Creating bin folder")
os.makedirs(BIN_DIR)
LIB_DIR = '/tmp/bin/lib'
if not os.path.exists(LIB_DIR):
print("Creating lib folder")
os.makedirs(LIB_DIR)
for filename in ['chromedriver', 'headless-chromium', 'lib/libgconf-2.so.4', 'lib/libORBit-2.so.0']:
oldfile = f'/opt/{filename}'
newfile = f'{BIN_DIR}/{filename}'
shutil.copy2(oldfile, newfile)
os.chmod(newfile, 0o775)
def init_web_driver():
setup()
chrome_options = webdriver.ChromeOptions()
_tmp_folder = '/tmp/{}'.format(uuid.uuid4())
if not os.path.exists(_tmp_folder):
os.makedirs(_tmp_folder)
if not os.path.exists(_tmp_folder + '/user-data'):
os.makedirs(_tmp_folder + '/user-data')
if not os.path.exists(_tmp_folder + '/data-path'):
os.makedirs(_tmp_folder + '/data-path')
if not os.path.exists(_tmp_folder + '/cache-dir'):
os.makedirs(_tmp_folder + '/cache-dir')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--window-size=1280x1696')
chrome_options.add_argument('--user-data-dir={}'.format(_tmp_folder + '/user-data'))
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.add_argument('--v=99')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--data-path={}'.format(_tmp_folder + '/data-path'))
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--homedir={}'.format(_tmp_folder))
chrome_options.add_argument('--disk-cache-dir={}'.format(_tmp_folder + '/cache-dir'))
chrome_options.add_argument(
'user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36')
chrome_options.binary_location = "/tmp/bin/headless-chromium"
driver = webdriver.Chrome(chrome_options=chrome_options)
return driver
def lambda_handler(event, context):
driver = init_web_driver()
driver.get("http://www.mjlivesey.co.uk")
time.sleep(4)
print(driver.title)
When I click the "Test" button for the first time, I can see the "driver.title" in the output screen. But when I click again just after a couple of seconds, below error is showing up:
{
"errorMessage": "[Errno 26] Text file busy: '/tmp/bin/chromedriver'",
"errorType": "OSError",
"stackTrace": [
" File \"/var/task/app.py\", line 61, in lambda_handler\n driver = init_web_driver()\n",
" File \"/var/task/app.py\", line 22, in init_web_driver\n setup()\n",
" File \"/var/task/app.py\", line 18, in setup\n shutil.copy2(oldfile, newfile)\n",
" File \"/var/lang/lib/python3.7/shutil.py\", line 266, in copy2\n copyfile(src, dst, follow_symlinks=follow_symlinks)\n",
" File \"/var/lang/lib/python3.7/shutil.py\", line 121, in copyfile\n with open(dst, 'wb') as fdst:\n"
]
}
And if I wait half an hour or more, I can run the code succesfully again. I don't get the problem here. Maybe you guys can help me to see the point.
Thanks.
It seems that several invocations of the same function are touching the same files on which previous executions still have an active lock.
To understand this behavior it’s useful to understand how the Lambda Execution environment actually works. Basically, if you execute the same function several times in a short span of time, AWS will try to reuse the same execution environment and resources; this saves time and resources.
Just to make a parallel, what is happening is the same as executing your code locally several times in parallel. Since all the processes are reading and writing on the same files/folders there will be inevitable race conditions.
In your case you you should refactor your setup
function in a way that its content gets executed only once per execution environment.
Also, you should be mindful of the fact that the /tmp
directory has an hard limit of 512MB after which your function will be killed. If you want to persist your data and/or have more headroom you should consider looking into attaching EFS to your lambda.