Search code examples
djangosubprocesswsgisplicezero-copy

django (or wsgi) chain stdout from subprocess


I am writing a webservice in Django to handle image/video streams, but it's mostly done in an external program. For instance:

  1. client requests for /1.jpg?size=300x200
  2. python code parse 300x200 in django (or other WSGI app)
  3. python calls convert (part of Imagemagick) using subprocess module, with parameter 300x200
  4. convert reads 1.jpg from local disk, convert to size accordingly
  5. Writing to a temp file
  6. Django builds HttpResponse() and read the whole temp file content as body

As you can see, the whole temp file read-then-write process is inefficient. I need a generic way to handle similar external programs like this, not only convert, but others as well like cjpeg, ffmepg, etc. or even proprietary binaries.

I want to implement it in this way:

  1. python gets the stdout fd of the convert child process
  2. chain it to WSGI socket fd for output

I've done my homework, Google says this kind of zero-copy could be done with system call splice(). but it's not available in Python. So how to maximize performance in Python for these kind of scenario?

  1. Call splice() using ctypes?
  2. hack memoryview() or buffer() ?
  3. subprocess has stdout which has readinto(), could this be utilized somehow?
  4. How could we get fd number for any WSGI app?

I am kinda newbie to these, any suggestion is appreciated, thanks!


Solution

  • I find that WSGI could actually handle an fd as an interator response

    Example WSGI app:

    def image_app(environ, start_response):
        start_response('200 OK', [('Content-Type', 'image/jpeg'), ('Connection', 'Close')])
        proc = subprocess.Popen([
            'convert',
            '1.jpg',
            '-thumbnail', '200x150',
            '-', //to stdout
        ], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        return proc.stdout
    

    It wrapps the stdout as http response via a pipe