Search code examples
pythonamazon-web-servicesaws-lambdagensimword2vec

AWS Lambda Boto gensim model module initialization error: __exit__


Hosting a word2vec model with gensim on AWS lambda

using python 2.7 boto==2.48.0 gensim==3.4.0

and I have a few lines in my function.py file where I load the model directly from s3

print('################### connecting to s3...')
s3_conn = boto.s3.connect_to_region(
        region,
        aws_access_key_id = Aws_access_key_id,
        aws_secret_access_key = Aws_secret_access_key,
        is_secure = True,
        calling_format = OrdinaryCallingFormat()
        )
print('################### connected to s3...')
bucket = s3_conn.get_bucket(S3_BUCKET)
print('################### got bucket...')
key = bucket.get_key(S3_KEY)
print('################### got key...')
model =  KeyedVectors.load_word2vec_format(key, binary=True)
print('################### loaded model...')

on the model loading line

    model =  KeyedVectors.load_word2vec_format(key, binary=True)

getting a mysterious error without much details:

on the cloud watch can see all of my print messages til '################### got key...' inclusive, then I get:

START RequestId: {req_id} Version: $LATEST 

then right after it [no time delays between these two messages]

module initialization error: __exit__ 

please, is there a way to get a detailed error or more info?

More background details : I was able to download the model from s3 to /tmp/ and it did authorize and retrieve the model file, but it went out of space [file is ~2GB, /tmp/ is 512MB]

so, switched to directly loading the model by gensim as above and now getting that mysterious error.

running the function with python-lambda-local works without issues

so, this probably narrows it down to an issue with gensim's smart open or aws lambda, would appreciate any hints, thanks!


Solution

  • instead of connecting using boto, simply:

    model = KeyedVectors.load_word2vec_format('s3://{}:{}@{}/{}'.format(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET, S3_KEY), binary=True)
    

    worked!

    but of course, unfortunately, it doesn't answer the question on why the mysterious exit error came up and how to get more info :/