Search code examples

Streaming data from Google Cloud Storage to an FTP Server

I'm trying to use gcsfs and ftplib to transfer a CSV by lines/chunks from Cloud Storage to an FTP server. I have large files in GCS that I can't store in-memory so I am trying to test this way.

from ftplib import FTP
import gcsfs
from urllib import request
import io

ftp = FTP('my-ftp-server')

fs = gcsfs.GCSFileSystem(project='my-project')

with'myFile.csv') as f:
    ftp.storlines("STOR myFile.csv", f)

but I get the error:

TypeError                                 Traceback (most recent call last)
<ipython-input-56-d461792392dd> in <module>
      1 with'myfile') as f:
----> 2     ftp.storlines("STOR myFile.csv", f)

~\.conda\envs\py3.7\lib\ in storlines(self, cmd, fp, callback)
    530         with self.transfercmd(cmd) as conn:
    531             while 1:
--> 532                 buf = fp.readline(self.maxline + 1)
    533                 if len(buf) > self.maxline:
    534                     raise Error("got more than %d bytes" % self.maxline)

TypeError: readline() takes 1 positional argument but 2 were given

Any suggestions on how I can fix this or achieve what I want?


  • Indeed, fsspec.AbstractFileSystem (on which GCSFileSystem is based), particularly its readline method, does not seem to be compatible with ftplib.

    Do you need to use FTP.storlines (text mode)? Cannot you use FTP.storbinary (binary mode)?

    with'myFile.csv') as f:
        ftp.storbinary("STOR myFile.csv", f)

    FTP.storbinary transfers the file by chunks (defined by an optional parameter blocksize with the default value of 8192).

    If not, you will have to implement a wrapper class with an API compatible with the FTP.storlines:

    class GCSFileSystemCompat:
        def __init__(self, f):
            self.f = f
        def readline(self, size):
            return f.readline()
    with'myFile.csv') as f,
        ftp.storlines("STOR myFile.csv", GCSFileSystemCompat(f))

    (untested, but it should give you the idea)