Search code examples
pythonflask

Change behavior of Flask-based API depending on whether the client is streaming the response or not


I have an API made in Python/Flask, which returns data as a list of dictionaries/JSONs. It can return a significant amount of data, so I would like it to be able to stream data to a client. I already have the necessary code to generate the stream of data, I just need to figure out how to provide this stream to clients.

I would like it to be up to the client to decide if they want the result to be streamed or not, ie if they don't stream the response they get a valid list of dicts directly, but if they stream the response they'll get each dict separated by newlines.

So far, my implementation is old and based on this blog post, and as pointed out in the blog post update, it isn't very good: requesting data without streaming results in a proper list of dicts, but streaming it results in a bunch of barely usable chunks of dicts.

I have tried the following new approach (I have somewhat simplified the exemple, here data_generator provides the data iteratively)

@app.route("/get_data", methods=["GET"])
def get_data(**kwargs):
    def stream_response():
        rows = data_generator
        for row in rows:
            yield json.dumps(row) + "\n"
    return Response(stream_response(), mimetype="application/json")

It works and streams the data dict by dict, but the whole response (when not streamed) is not a valid, decodable JSON.

Is there a standard way to detect server-side if a client is asking for a stream of data or the whole result, so I can return valid data in both cases ? I have thought of adding an argument or header to the API so that client can specify what they need but I would like a more "universal" approach.


Solution

  • First, it sounds like your streaming data is using the JSON Lines format. There's no native Flask support for that, I'm afraid, but knowing the name might help find more resources.

    Is there a standard way to detect server-side if a client is asking for a stream of data

    I don't think there is a reliable way of detecting streaming clients. Especially before you even send the first byte of the response body.

    My suggestion is to make the client signal which one it wants. And there's already a perfectly fine header for that, Accept: mimetype.

    @app.route("/get_data", methods=["GET"])
    def get_data(**kwargs):
        # Not an official mimetype, but already in use by AWS.
        if request.accept_mimetypes['application/jsonlines']:
            def stream_response():
                rows = data_generator
                for row in rows:
                    yield json.dumps(row) + "\n"
            return Response(stream_response(), mimetype="application/jsonlines")
        else:
            # Send all data at once.
            return jsonify(list(data_generator))
    

    Then on the streaming client side, you must set the Accept header, like so (Javascript example):

    fetch('/get_data', {headers: {'Accept': 'application/jsonlines'}})