Search code examples
pythonftppython-requestsdropboxdropbox-api

Python - Transfer a file from HTTP(S) URL to FTP/Dropbox without disk writing (chunked upload)


I have a large file (500 Mb-1Gb) stored on a HTTP(S) location
(say https://example.com/largefile.zip).

I have read/write access to an FTP server

I have normal user permissions (no sudo).

Within these constraints I want to read the file from the HTTP URL via requests and send it to the FTP server without writing to disk first.

So normally, I would do.

response=requests.get('https://example.com/largefile.zip', stream=True)
with open("largefile_local.zip", "wb") as handle:                                                                                                     
 for data in response.iter_content(chunk_size=4096):
  handle.write(data)     

and then upload the local file to FTP. But I want to avoid the disk I/O. I cannot mount the FTP as a fuse filesystem because I don't have super user rights.

Ideally I would do something like ftp_file.write() instead of handle.write(). Is that possible? The ftplib documentation seems to assume only local files will be uploaded, not response.content. So ideally I would like to do

response=requests.get('https://example.com/largefile.zip', stream=True)
for data in response.iter_content(chunk_size=4096):
 ftp_send_chunk(data)   

I am not sure how to write ftp_send_chunk().

There is a similar question here (Python - Upload a in-memory file (generated by API calls) in FTP by chunks). My use case requires retrieving a chunk from the HTTP URL and writing it to FTP.

P.S.: The solution provided in the answer (wrapper around urllib.urlopen) will work with dropbox uploads as well. I had problems working with my ftp provider ,so finally used dropbox, which is working reliably.

Note that Dropbox has a "add web upload" feature in the api which does the same thing (remote upload). That only works with "direct" links. In my use case the http_url came from a streaming service that was i.p. restricted. So this workaround became necessary. Here's the code

import dropbox;
d = dropbox.Dropbox(<ACTION-TOKEN>);
f=FileWithProgress(filehandle);
filesize=filehandle.length;
targetfile='/'+fname;
CHUNK_SIZE=4*1024*1024
upload_session_start_result = d.files_upload_session_start(f.read(CHUNK_SIZE));
num_chunks=1
cursor = dropbox.files.UploadSessionCursor(session_id=upload_session_start_result.session_id,
                                           offset=CHUNK_SIZE*num_chunks)
commit = dropbox.files.CommitInfo(path=targetfile)
while CHUNK_SIZE*num_chunks < filesize:
 if ((filesize - (CHUNK_SIZE*num_chunks)) <= CHUNK_SIZE):
  print d.files_upload_session_finish(f.read(CHUNK_SIZE),cursor,commit)
 else:
  d.files_upload_session_append(f.read(CHUNK_SIZE),cursor.session_id,cursor.offset)
 num_chunks+=1
cursor.offset = CHUNK_SIZE*num_chunks
link = d.sharing_create_shared_link(targetfile)  
url = link.url
dl_url = re.sub(r"\?dl\=0", "?dl=1", url)
dl_url = dl_url.strip()
print 'dropbox_url: ',dl_url;

I think it should even be possible to do this with google-drive via their python api , but using credentials with their python wrapper is too hard for me. Check this1 and this2


Solution

  • It should be easy with urllib.request.urlopen, as it returns a file-like object, which you can use directly with FTP.storbinary.

    ftp = FTP(host, user, passwd)
    
    filehandle = urllib.request.urlopen(http_url)
    
    ftp.storbinary("STOR /ftp/path/file.dat", filehandle)
    

    If you want to monitor progress, implement a wrapper file-like object that will delegate calls to filehandle object, but will also display the progress:

    class FileWithProgress:
    
        def __init__(self, filehandle):
            self.filehandle = filehandle
            self.p = 0
    
        def read(self, blocksize):
            r = self.filehandle.read(blocksize)
            self.p += len(r)
            print(str(self.p) + " of " + str(self.p + self.filehandle.length)) 
            return r
    
    filehandle = urllib.request.urlopen(http_url)
    
    ftp.storbinary("STOR /ftp/path/file.dat", FileWithProgress(filehandle))
    

    For Python 2 use:

    • urllib.urlopen, instead of urllib.request.urlopen.
    • filehandle.info().getheader('Content-Length') instead of str(self.p + filehandle.length)