I am asking this out of curiosity.
What I am doing:
to_csv()
As the tempfile
is opened in binary mode by default but the to_csv()
method by default writes in text mode (which I need because I want to have UTF-8 as format) I am asking myself how you can write in text mode to a file opened in binary mode? I also need the binary format for the transfer to the FTP server.
What I did in detail:
I created a temp file like this:
fp = tempfile.NamedTemporaryFile(delete=False)
As I unterstand from the documentation the file is opened in binary mode.
tempfile.NamedTemporaryFile(mode='w+b', buffering=-1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, delete=True, *, errors=None)
Then I saved my dataframe to the temp file like this:
df.to_csv(fp.name)
fp.flush()
fp.seek(0)
Also the to_csv()
method states in the documentation that you need to open the file with newlines=''
which only works in text mode. So I couldn't set the newline
argument using a file opened in binary mode.
path_or_bufstr or file handle, default None File path or object, if None is provided the result is returned as a string. If a file object is passed it should be opened with
newline=''
, disabling universal newlines.
Then I used the storbinary()
method from the ftplib to push the temp file to the FTP server. As I understand from the documentation the method requires a binary file.
FTP.storbinary(cmd, fp, blocksize=8192, callback=None, rest=None) Store a file in binary transfer mode.
cmd
should be an appropriateSTOR
command:"STOR filename"
.fp
is a file object (opened in binary mode) which is read until EOF using itsread()
method in blocks of sizeblocksize
to provide the data to be stored. Theblocksize
argument defaults to 8192.callback
is an optional single parameter callable that is called on each block of data after it is sent.rest
means the same thing as in the transfercmd() method.
For completeness I afterwards closed and deleted the file like this:
fp.close()
os.unlink(fp.name)
I thought about opening the tempfile
in w+t
mode so that it matches the to_csv()
method, which recommends opening the file with newlines=''
which only works in text mode. Also I need to specify the UTF-8 format for the CSV file which only works in text mode. ftplib's storbinary()
method requires a file opened in binary mode. (storlines()
method also does) so this doesn't fit.
So I opened the file in binary mode, wrote to it in text mode and transferred it using binary mode. Everything works and the result looks like I want it to but I am a bit confused if I am doing it the right way. How does writing in text mode to a file opened in binary mode work? I kind of assumed I would have to open the file in text mode in order to write in text mode to it using to_csv()
.
If anyone has a deeper knowledge about this and could clear up my confusion I would be very grateful. I don't like doing things not knowing why they work or if they should work haha.
Thanks!
This is quite broad question. Just briefly. This is all mostly about line endings. That's basically the only distinction between the binary and text modes.
\n
) are converted according to the newline
parameter.line_terminatorstr
parameter of the DataFrame.to_csv
.storbinary
, the file will be uploaded as is. If you use storlines
, you let the FTP server convert the line endings.