Search code examples
pythondownloadtornado

python tornado download remote file


I want to download a remote file and give the filename. This method works if the file is on our server. But it doesn't work for remote file and download as somerandomname.pdf

<a href="http://file.com/somerandomname.pdf" download="mypdf.pdf">DOWNLOAD</a>

Now I tried in python handler to download it. It works and download the filename I want. But the problem is that I can only see the downloaded file util the download is complete in the browser. I cannot see the download process in browser. It's just loading the remote file in the backend. Is there a way to fix this?

def get(self):
    url = self.get_argument('url')
    filename = self.get_argument('filename')
    self.set_header('Content-Type', 'application/octet-stream')
    self.set_header('Content-Disposition', 'attachment; filename=%s' % filename)
    f = urllib2.urlopen(url)
    self.write(f.read())
    self.finish()

Solution

    1. Headers are not sent as soon as you call set_header(); they are not sent until you call flush() or finish() (among other things, this is what makes it possible to replace the output with an error page if an exception is raised before the call to flush())

    2. Even if you call flush(), the entire server is blocked during the call to urlopen(). This is a blocking call which must be replaced with an asynchronous version in Tornado (see the user's guide for more on this). Tornado provides an asynchronous HTTP client which can be used in place of urlopen():

      @gen.coroutine
      def get(self):
          url = self.get_argument('url')
          filename = self.get_argument('filename')
          self.set_header('Content-Type', 'application/octet-stream')
          self.set_header('Content-Disposition', 'attachment; filename=%s' % filename)
      
          self.flush()
          response = yield AsyncHTTPClient().fetch(url)
          self.finish(response.body)
      
    3. This process loads the entire remote file into memory at once, and doesn't send any of it to the browser until the whole thing has been read from the remote server. If the file is large, you may wish to read it in chunks and send them back to the client as they are read:

      # inside get() as above, after self.flush():
      def streaming_callback(chunk):
          self.write(chunk)
          self.flush()
      yield AsyncHTTPClient().fetch(url, streaming_callback=streaming_callback)
      self.finish()