Search code examples
pythongoogle-app-engineflaskgoogle-cloud-datastore

Memory leak in Google App Engine / Datastore / Flask / Python app


I have built a simple news aggregator site, in which the memory usage of all my App Engine instances keep growing until reaching the limit and therefore being shut down.

I have started to eliminate everything from my app to arrive at a minimal reproducible version. This is what I have now:


app = Flask(__name__)

datastore_client = datastore.Client()

@app.route('/')
def root():
    
    query = datastore_client.query(kind='source')
    query.order = ['list_sequence']
    sources = query.fetch() 
    
    for source in sources:
        pass
    

Stats show a typical saw-tooth pattern: at instance startup, it goes to 190 - 210 Mb, then upon some requests, but NOT ALL requests, memory usage increases by 20 - 30 Mb. (This, by the way, roughly corresponds to the estimated size of the query results, although I cannot be sure this is relevant info.) This keeps happening until it exceeds 512 Mb, when it is shut down. It usually happens at around the 50th - 100th request to "/". No other requests are made to anything else in the meantime.

Now, if I eliminate the "for" cycle, and only the query remains, the problem goes away, the memory usage remains at 190 Mb flat, no increase even after 100+ requests.

gc.collect() at the end does not help. I have also tried looking at the difference in tracemalloc stats at the beginning and end of the function, I have not found anything useful.

Has anyone experienced anything similar, please? Any ideas what might go wrong here? What additional tests / investigations can you recommend? Is this possibly a Google App Engine / Datastore issue I have no control of?

Thank you.

enter image description here


Solution

  • @Alex in the other answer did a pretty good research, so I will follow up with this recommendation: try using the NDB Library. All calls with this library have to be wrapped into a context manager, which should guarantee cleaning up after closing. That could help fix your problem:

    ndb_client = ndb.Client(**init_client)
    
    with ndb_client.context():
        query = MyModel.query().order(MyModel.my_column)
        sources = query.fetch()
        for source in sources:
            pass
    
    # if you try to query DataStore outside the context manager, it will raise an error
    query = MyModel.query().order(MyModel.my_column)