Search code examples
pythonrestflaskpidflask-restful

Checking the status of a process using its PID from Python


My REST API written in Python spawns processes that take about 3 minutes to complete. I store the PID in a global array and setup a secondary check method that should confirm whether the process is still running, or if it has finished.

The only methods I can find are to poll the subprocess (which I don't have access to in this route), or try to kill the process to see if it's alive. Is there any clean way to just get a binary answer of if it's still running based on the PID, and if it completed successfully if not?

from flask import Flask, jsonify, request, Response
from subprocess import Popen, PIPE
import os

app = Flask(__name__)

QUEUE_ID = 0

jobs = []

@app.route("/compile", methods=["POST"])
def compileFirmware():

    f = request.files['file']
    f.save(f.filename)

    os.chdir("/opt/src/2.0.x")

    process = Popen(['platformio', 'run', '-e', 'mega2560'], stdout=PIPE, stderr=PIPE, universal_newlines=True)

    global QUEUE_ID
    QUEUE_ID += 1

    data = {'id':QUEUE_ID, 'pid':process.pid}
    jobs.append(data)

    output, errors = process.communicate()
    print (output)
    print (errors)

    response = jsonify()
    response.status_code = 202 #accepted
    response.headers['location'] = '/queue/' + str(QUEUE_ID)
    response.headers.add('Access-Control-Allow-Origin', '*')
    return response

@app.route("/queue/<id>", methods=["GET"])
def getStatus(id):

    #CHECK PID STATUS HERE

    content = {'download_url': 'download.com'}
    response = jsonify(content)
    response.headers.add('Access-Control-Allow-Origin', '*')
    return response

if __name__ == '__main__':
    app.run(host='0.0.0.0',port=8080)

Solution

  • Here is a little simulation that works:

    from flask import Flask, jsonify, request, Response, abort
    from subprocess import Popen, PIPE
    import os
    
    app = Flask(__name__)
    
    QUEUE = { }
    
    @app.route("/compile", methods=["POST"])
    def compileFirmware():
        process = Popen(['python','-c','"import time; time.sleep(300)"'], stdout=PIPE, stderr=PIPE, universal_newlines=True)
        QUEUE[str(process.pid)] = process # String because in GET the url param will be interpreted as str
        response = jsonify()
        response.status_code = 202 #accepted
        response.headers['location'] = '/queue/' + str(process.pid)
        response.headers.add('Access-Control-Allow-Origin', '*')
        return response
    
    @app.route("/queue/<id>", methods=["GET"])
    def getStatus(id):
        process = QUEUE.get(id, None)
        if process is None:
            abort(404, description="Process not found")
        retcode = process.poll()
        if retcode is None:
            content = {'download_url': None, 'message': 'Process is still running.'}
        else:
            # QUEUE.pop(id) # Remove reference from QUEUE ?
            content = {'download_url': 'download.com', 'message': f'process has completed with retcode: {retcode}'}
        response = jsonify(content)
        response.headers.add('Access-Control-Allow-Origin', '*')
        return response
    
    if __name__ == '__main__':
        app.run(host='0.0.0.0',port=8080)
    

    There are further considerations you must think about if this application is going to be used as more than an individual project.

    • We use QUEUE global variable to store the states of processes. But in a real project, the deployment via wsgi / gunicorn can have multiple workers with each worker having its own global variable. So for scaling up, consider using a redis / mq data store instead.

    • Does the QUEUE ever need to be cleaned up? Should it be cleaned up? It has a disadvantage that if you clean it up after the value has been GET once, the next GET fetches 404. It is a design decision if the GET api must be idempotent(most likely yes).