Some time ago I wrote a Flask application in Python 3, which spawns a new thread that does some processing, then updates the application DB (SQLite) on a regular basis. The function I use to update the DB is quite simple:
def update_db(tsid, **kwargs):
# FLASK_APP is my main Flask() object, set once when thread starts
# DATABASE is the SQLAlchemy DB connection, set once when thread starts
# TrainingSession is my custom DB model object (from SQLAlchemy.Model)
with FLASK_APP.app_context():
dbrec = TrainingSession.query.filter_by(id=tsid).first()
if dbrec:
for key, value in kwargs.items():
setattr(dbrec, key, value)
DATABASE.session.commit()
This worked quite well - my thread has spawned several other threads, each of these calling the update_db()
function above, which updated the DB correctly.
Now I changed my code, because I wanted to make use of a process instead of a thread (which would then again spawn those other threads). So I did nothing more than simply inherit my class from Python multiprocessing.Process
instead of threading.Thread
. This child process now does all the work - and still correctly.
Very well. But what I don't understand is why my DB updates still work as before, even though I have not changed the logic shown above. The global variables FLASK_APP
and DATABASE
are set in my main application process, before the new process is being started. Then in this new process I still call the global function update_db()
as above, and that still works.
First, how can another process access these two global variables? They are not passed from the main process to the child process, the latter simply accesses them without any change in code (since they are global objects in the Python script). Moreover, how can the child process access and update the DB, if both FLASK_APP
and DATABASE
objects have been created in the main process (and, again, never been passed to the child process)?
This is because by default multiprocessing uses fork()
to create the new process which means that the entire memory state including all those variables in the parent exist in the child also.
If you change the multiprocessing to spawn
mode, you will get fresh interpreters without any existing state for the children and your code will no longer magically work.
The docs explain it at https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods