Hosting a word2vec model with gensim on AWS lambda
using python 2.7 boto==2.48.0 gensim==3.4.0
and I have a few lines in my function.py file where I load the model directly from s3
print('################### connecting to s3...')
s3_conn = boto.s3.connect_to_region(
region,
aws_access_key_id = Aws_access_key_id,
aws_secret_access_key = Aws_secret_access_key,
is_secure = True,
calling_format = OrdinaryCallingFormat()
)
print('################### connected to s3...')
bucket = s3_conn.get_bucket(S3_BUCKET)
print('################### got bucket...')
key = bucket.get_key(S3_KEY)
print('################### got key...')
model = KeyedVectors.load_word2vec_format(key, binary=True)
print('################### loaded model...')
on the model loading line
model = KeyedVectors.load_word2vec_format(key, binary=True)
getting a mysterious error without much details:
on the cloud watch can see all of my print messages til '################### got key...' inclusive, then I get:
START RequestId: {req_id} Version: $LATEST
then right after it [no time delays between these two messages]
module initialization error: __exit__
please, is there a way to get a detailed error or more info?
More background details : I was able to download the model from s3 to /tmp/ and it did authorize and retrieve the model file, but it went out of space [file is ~2GB, /tmp/ is 512MB]
so, switched to directly loading the model by gensim as above and now getting that mysterious error.
running the function with python-lambda-local works without issues
so, this probably narrows it down to an issue with gensim's smart open or aws lambda, would appreciate any hints, thanks!
instead of connecting using boto, simply:
model = KeyedVectors.load_word2vec_format('s3://{}:{}@{}/{}'.format(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET, S3_KEY), binary=True)
worked!
but of course, unfortunately, it doesn't answer the question on why the mysterious exit error came up and how to get more info :/