Search code examples
google-app-enginegoogle-cloud-platformpytorchtorch

Trouble installing Torch on Google App Engine


I've built a machine learning api that uses Torch as the ML framework. When I upload the code to Googe App Engine it runs out of memory.
After some debugging I found out that the issue it the installation of Torch.

I'm using Torch 1.5.0 and python 3.7.4

So how do I fix this error? Maybe I can change something i app.yaml?

Error message:

Step #1 - "builder": OSError: [Errno 12] Cannot allocate memory
Step #1 - "builder":     self.pid = os.fork()
Step #1 - "builder":   File "/usr/lib/python2.7/subprocess.py", line 938, in _execute_child
Step #1 - "builder":     errread, errwrite)
Step #1 - "builder":   File "/usr/lib/python2.7/subprocess.py", line 394, in __init__
Step #1 - "builder":   File "/usr/local/bin/ftl.par/__main__/ftl/python/layer_builder.py", line 346, in _python_version
Step #1 - "builder":   File "/usr/local/bin/ftl.par/__main__/ftl/python/layer_builder.py", line     332, in GetCacheKeyRaw
Step #1 - "builder":   File "/usr/local/bin/ftl.par/__main__/ftl/python/layer_builder.py", line 109, in GetCacheKeyRaw
Step #1 - "builder":   File "/usr/local/bin/ftl.par/__main__/ftl/common/single_layer_image.py", line 60, in GetCacheKey
Step #1 - "builder":   File "/usr/local/bin/ftl.par/__main__/ftl/python/layer_builder.py", line 153, in BuildLayer
Step #1 - "builder":   File "/usr/local/bin/ftl.par/__main__/ftl/python/builder.py", line 114, in Build
Step #1 - "builder":   File "/usr/local/bin/ftl.par/__main__.py", line 54, in main
Step #1 - "builder":   File "/usr/local/bin/ftl.par/__main__.py", line 65, in <module>
Step #1 - "builder":     exec code in run_globals
Step #1 - "builder":   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
Step #1 - "builder":     "__main__", fname, loader, pkg_name)
Step #1 - "builder":   File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
Step #1 - "builder": Traceback (most recent call last):

And again this error message didn't appear when i didn't include torch in my requirements.txt

to reproduce:

app.yaml

runtime: python37
resources:
    memory_gb: 16
    disk_size_gb: 10

requirements.txt

gunicorn==20.0.4
aniso8601==8.0.0
beautifulsoup4==4.9.0
boto3==1.13.3
botocore==1.16.3
bs4==0.0.1
certifi==2020.4.5.1
chardet==3.0.4
click==7.1.2
colorama==0.4.3
docutils==0.15.2
filelock==3.0.12
Flask==1.1.2
Flask-RESTful==0.3.8
googletrans==2.4.0
idna==2.9
itsdangerous==1.1.0
Jinja2==2.11.2
jmespath==0.9.5
joblib==0.14.1
MarkupSafe==1.1.1
numpy==1.18.4
protobuf==3.11.3
python-dateutil==2.8.1
pytz==2020.1
regex==2020.4.4
requests==2.23.0
s3transfer==0.3.3
sacremoses==0.0.43
sentencepiece==0.1.86
six==1.14.0
soupsieve==2.0
tokenizers==0.5.2
tqdm==4.46.0
transformers==2.8.0
urllib3==1.25.9
Werkzeug==1.0.1

main.py

import flask
from flask import Flask, request
from flask_restful import Api, Resource

app = Flask(__name__)
api = Api(app)

production = False

import json

# Import api code

# Create main api 'view'
class main_api(Resource):

    def get(self):
        question = request.args.get('question')

        # Run the script
        # But not necessary for the minimum working test

        return {
            'question': question,
            # 'results': results_from_script,
        }

# Adds resource
api.add_resource(main_api, '/')

# Starts the api
if __name__ == '__main__':
    host = '127.0.0.1'
    port = 8080
    app.run(host=host, port=port, debug=not production)

Solution

  • I fixed this error by using the flex enviroment.
    The only thing I had to change was the app.yaml

    runtime: python
    env: flex
    entrypoint: gunicorn -b :$PORT main:app
    runtime_config:
        python_version: 3
    
    manual_scaling:
        instances: 1
    resources:
        cpu: 2
        memory_gb: 5
        disk_size_gb: 10
    

    And then it was ready to be deployed