I'm writing a python script for the data acquisition phase of my project and so far I've been storing data in .csv files. While I was reading data from a particular .csv file, I got the error:
syntaxError: EOL while scanning string literal
I took a look at the specific row in the file and and the data in the specific cell were truncated. I am using pandas to store dicts to csv and it never threw an error. I guess .csv will save itself no matter what, even if that means it will delete data without any warning.
I thought of changing to .xls. When the same row was being stored, an error came up saying (something along the lines of):
Max character length reached. Max character length per cell was ~32k.
Then I thought that it may just be an excel/libreoffice calc issue (I tried both) and they can't visualize the data in cell but they are actually there. So I tried printing the specific cell; data were indeed truncated. The specific cell contains a dict, whose values are float, int, boolean or string. However, all of them have been converted to strings.
My question is, is there a way to fix it without changing the file format?
In the case that I have to change the file format, what would be an appropriate choice to store very large files? I am thinking about hdf5.
In case you need more info, do let me know. Thank you!
There is a limit to fields size:
csv.field_size_limit([new_limit]) Returns the current maximum field size allowed by the parser. If new_limit is given, this becomes the new limit.
On my system (Python 3.8.0), I get:
>>> import csv
>>> csv.field_size_limit()
131072
which is exactly 128 kB.
You could try to set the limit higher:
csv.field_size_limit(your_new_limit)
But maybe a different file format would be more adapted depending on what kind of data you store.