Search code examples
pythonpandascsvftpftplib

Python - Write in text mode to file opened in binary mode


I am asking this out of curiosity.

What I am doing:

  • creating a temp file
  • writing data from a Pandas dataframe to it by using to_csv()
  • pushing the file to a FTP server

As the tempfile is opened in binary mode by default but the to_csv() method by default writes in text mode (which I need because I want to have UTF-8 as format) I am asking myself how you can write in text mode to a file opened in binary mode? I also need the binary format for the transfer to the FTP server.

What I did in detail:

I created a temp file like this:

fp = tempfile.NamedTemporaryFile(delete=False)

As I unterstand from the documentation the file is opened in binary mode.

tempfile.NamedTemporaryFile(mode='w+b', buffering=-1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, delete=True, *, errors=None)

Then I saved my dataframe to the temp file like this:

df.to_csv(fp.name)
fp.flush()
fp.seek(0)

Also the to_csv() method states in the documentation that you need to open the file with newlines='' which only works in text mode. So I couldn't set the newline argument using a file opened in binary mode.

path_or_bufstr or file handle, default None File path or object, if None is provided the result is returned as a string. If a file object is passed it should be opened with newline='', disabling universal newlines.

Then I used the storbinary() method from the ftplib to push the temp file to the FTP server. As I understand from the documentation the method requires a binary file.

FTP.storbinary(cmd, fp, blocksize=8192, callback=None, rest=None) Store a file in binary transfer mode. cmd should be an appropriate STOR command: "STOR filename". fp is a file object (opened in binary mode) which is read until EOF using its read() method in blocks of size blocksize to provide the data to be stored. The blocksize argument defaults to 8192. callback is an optional single parameter callable that is called on each block of data after it is sent. rest means the same thing as in the transfercmd() method.

For completeness I afterwards closed and deleted the file like this:

fp.close()
os.unlink(fp.name)

I thought about opening the tempfile in w+t mode so that it matches the to_csv() method, which recommends opening the file with newlines='' which only works in text mode. Also I need to specify the UTF-8 format for the CSV file which only works in text mode. ftplib's storbinary() method requires a file opened in binary mode. (storlines() method also does) so this doesn't fit.

So I opened the file in binary mode, wrote to it in text mode and transferred it using binary mode. Everything works and the result looks like I want it to but I am a bit confused if I am doing it the right way. How does writing in text mode to a file opened in binary mode work? I kind of assumed I would have to open the file in text mode in order to write in text mode to it using to_csv().

If anyone has a deeper knowledge about this and could clear up my confusion I would be very grateful. I don't like doing things not knowing why they work or if they should work haha.

Thanks!


Solution

  • This is quite broad question. Just briefly. This is all mostly about line endings. That's basically the only distinction between the binary and text modes.

    • If you "open" a file in the binary mode, all data are written exactly as they are. If you open a file in the text mode, newlines (\n) are converted according to the newline parameter.
    • I do not think that Pandas need the file to be opened in the text mode. If you open the file in the binary mode, then whatever Pandas writes will end up physically in the file. See line_terminatorstr parameter of the DataFrame.to_csv.
    • It's mostly the same with FTP. If you use storbinary, the file will be uploaded as is. If you use storlines, you let the FTP server convert the line endings.