Search code examples
pythonpython-3.xcsvlarge-files

What is an appropriate choice to store very large files with python? .csv files truncated data in certain cells


I'm writing a python script for the data acquisition phase of my project and so far I've been storing data in .csv files. While I was reading data from a particular .csv file, I got the error:

syntaxError: EOL while scanning string literal

I took a look at the specific row in the file and and the data in the specific cell were truncated. I am using pandas to store dicts to csv and it never threw an error. I guess .csv will save itself no matter what, even if that means it will delete data without any warning.

I thought of changing to .xls. When the same row was being stored, an error came up saying (something along the lines of):

Max character length reached. Max character length per cell was ~32k.

Then I thought that it may just be an excel/libreoffice calc issue (I tried both) and they can't visualize the data in cell but they are actually there. So I tried printing the specific cell; data were indeed truncated. The specific cell contains a dict, whose values are float, int, boolean or string. However, all of them have been converted to strings.

My question is, is there a way to fix it without changing the file format?

In the case that I have to change the file format, what would be an appropriate choice to store very large files? I am thinking about hdf5.

In case you need more info, do let me know. Thank you!


Solution

  • There is a limit to fields size:

    csv.field_size_limit([new_limit])
    Returns the current maximum field size allowed by the parser. 
    If new_limit is given, this becomes the new limit.
    

    On my system (Python 3.8.0), I get:

    >>> import csv
    >>> csv.field_size_limit()
    131072
    

    which is exactly 128 kB.

    You could try to set the limit higher:

     csv.field_size_limit(your_new_limit)
    

    But maybe a different file format would be more adapted depending on what kind of data you store.