I'm trying to use gcsfs and ftplib to transfer a CSV by lines/chunks from Cloud Storage to an FTP server. I have large files in GCS that I can't store in-memory so I am trying to test this way.
from ftplib import FTP
import gcsfs
from urllib import request
import io
ftp = FTP('my-ftp-server')
fs = gcsfs.GCSFileSystem(project='my-project')
with fs.open('myFile.csv') as f:
ftp.storlines("STOR myFile.csv", f)
but I get the error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-56-d461792392dd> in <module>
1 with fs.open('myfile') as f:
----> 2 ftp.storlines("STOR myFile.csv", f)
~\.conda\envs\py3.7\lib\ftplib.py in storlines(self, cmd, fp, callback)
530 with self.transfercmd(cmd) as conn:
531 while 1:
--> 532 buf = fp.readline(self.maxline + 1)
533 if len(buf) > self.maxline:
534 raise Error("got more than %d bytes" % self.maxline)
TypeError: readline() takes 1 positional argument but 2 were given
Any suggestions on how I can fix this or achieve what I want?
Indeed, fsspec.AbstractFileSystem
(on which GCSFileSystem
is based), particularly its readline
method, does not seem to be compatible with ftplib.
Do you need to use FTP.storlines
(text mode)? Cannot you use FTP.storbinary
(binary mode)?
with fs.open('myFile.csv') as f:
ftp.storbinary("STOR myFile.csv", f)
FTP.storbinary
transfers the file by chunks (defined by an optional parameter blocksize
with the default value of 8192).
If not, you will have to implement a wrapper class with an API compatible with the FTP.storlines
:
class GCSFileSystemCompat:
def __init__(self, f):
self.f = f
def readline(self, size):
return f.readline()
with fs.open('myFile.csv') as f,
ftp.storlines("STOR myFile.csv", GCSFileSystemCompat(f))
(untested, but it should give you the idea)