In Jupyter, I have a dataframe of 400 000 objects that I can't entirely export to a JSON file without facing the following error.
The export is working great as long as I limit the exportation to the first 141 000 objects , whatever the order of these first objects.
Should I be aware of any size limitation dealing with large JSON files ? Thank you.
OverflowError Traceback (most recent call last)
<ipython-input-254-b59373f1eeb2> in <module>
----> 1 df4.to_json('test.json', orient = 'records')
~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines, compression, index)
1889 default_handler=default_handler,
1890 lines=lines, compression=compression,
-> 1891 index=index)
1892
1893 def to_hdf(self, path_or_buf, key, **kwargs):
~/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines, compression, index)
56 double_precision=double_precision, ensure_ascii=force_ascii,
57 date_unit=date_unit, default_handler=default_handler,
---> 58 index=index).write()
59
60 if lines:
~/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py in write(self)
99 return self._write(self.obj, self.orient, self.double_precision,
100 self.ensure_ascii, self.date_unit,
--> 101 self.date_format == 'iso', self.default_handler)
102
103 def _write(self, obj, orient, double_precision, ensure_ascii,
~/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py in _write(self, obj, orient, double_precision, ensure_ascii, date_unit, iso_dates, default_handler)
154 double_precision,
155 ensure_ascii, date_unit,
--> 156 iso_dates, default_handler)
157
158
~/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py in _write(self, obj, orient, double_precision, ensure_ascii, date_unit, iso_dates, default_handler)
110 date_unit=date_unit,
111 iso_dates=iso_dates,
--> 112 default_handler=default_handler
113 )
114
OverflowError: int too big to convert
There is no inherent limitation on data size in JSON, so this isn't your problem: the message suggests some difficulty with a particular integer value.
This underlines the difficulty of working with such large files, since you now have to isolate the particular record that's causing the problems in the middle of the to_json
call.
Since you know roughly where the problem is you could try converting subsets of your data frame in a bisection technique to home in on the row that's causing the issues.