flask sqlalchemy multiprocessing flask-sqlalchemy pymysql

Flask SQLAlchemy and multiprocessing

I have a weird issue which i think i managed to narrow down to using sqlalchemy and multiprocessing.

I am using flask, flask-sqlalchemy, pymysql with a mysql db.

I have a flask app (models.py):

from flask import Flask
from flask_sqlalchemy import SQLAlchemy

app = Flask('appname')
db = SQLAlchemy(app)

I am then using flask to call a function from a different python file using multiprocessing (app.py):

import multiprocessing
from models import app, db, TaskLog
from functions import do_stuff

@app.route('/some/address', methods=['GET'])
def start_process():
    process = multiprocessing.Process(target=do_stuff, args=(task_id, work_location))
    process.name = task_id
    process.start()
    return '', 200

@app.route('something/<task_id>/checklog', methods=['GET'])
def check_log(task_id):
    task_log = db.session.query(TaskLog).filter_by(task_id=task_id).all()
    return task_log

This other file that holds the do_stuff() function is importing the db (functions.py):

from models import app, db

# Do some stuff reading from and writing to the db

Whats the problem:

From time to time i get weird errors saying stuff like:

pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')

pymysql.err.InterfaceError: (0, '')

sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2014, 'Command Out of Sync')

And the error is not always at the same place. Sometimes its on the line which is doing a db query while next time its on the line that does db.session.commit() etc. The weird thing is that sometimes it works fine for hours and sometimes it errors out at a random location.

I think that the issue is that i am importing the same db declared in models.py to both the app.py and functions.py (which runs as a multiprocessing process).

Can you guys confirm that this is in fact the issue and suggest how this should be done properly with flask-sqlalchemy?

Any help would be greatly appreciated.

Solution

The way to resolve this issue is to change what multiprocessing is doing when starting a process.

Apparently multiprocessing, by default, forks the current process instead of starting a new one. The way around this is to define the "set_start_method" before defining a process.

multiprocessing.set_start_method('spawn')
process = multiprocessing.Process(target=do_stuff, args=(task_id, work_location))
process.start()

This way multiprocessing "spawns" a new process from scratch and nothing is inherited from the parent process.

If you just hit process.start() without setting the start method to "spawn" the application context will be inherited by the new (forked) process. Once changed to "spawn" the new process is working outside the application context.