Consider the following minimal example:
@dataclass
class ExportEngine:
def __post_init__(self):
self.list = pandas.DataFrame(columns=list(MyObject.CSVHeaders()))
def export(self):
self.prepare()
self.list.to_csv("~/Desktop/test.csv")
def prepare(self):
values = numpy.concatenate(
(
numpy.array(["Col1Value", "Col2Value", " Col3Value", "Col4Value"]),
numpy.repeat("", 24),
)
)
for x in range(8): #not the best way, but done due to other constraints
start = 3 + (x * 3) - 2
end = start + 3
values[start:end] = [
"123",
"some_random_value_that_gets_truncated",
"456",
]
self.list.loc[len(self.list)] = values
When export()
is called, some_random_value_that_gets_truncated
is truncated to some_rando
:
['Col1Value', '123', 'some_rando', '456', '123', 'some_rando', '456', '123', 'some_rando', '456', '123', 'some_rando', '456', '123', ...]
I've tried setting the following:
pandas.set_option("display.max_colwidth", 10000)
, but this doesn't change anything...
Why does this happen, and how can I prevent the truncation?
So, numpy
will by default choose a suitable, fixed-length unicode format.
Notice the dtype:
In [1]: import numpy
In [2]: values = numpy.concatenate(
...: (
...: numpy.array(["Col1Value", "Col2Value", " Col3Value", "Col4Value"]),
...: numpy.repeat("", 24),
...: )
...: )
In [3]: values
Out[3]:
array(['Col1Value', 'Col2Value', ' Col3Value', 'Col4Value', '', '', '',
'', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
'', '', '', ''], dtype='<U10')
You should probably just not use numpy directly, but one quick fix is to replace:
values = numpy.concatenate(
(
numpy.array(["Col1Value", "Col2Value", " Col3Value", "Col4Value"]),
numpy.repeat("", 24),
)
)
with:
values = np.array(
['Col1Value', 'Col2Value', ' Col3Value', 'Col4Value', *[""]*24],
dtype=object
)
Notice the dtype=object
, which will use just pointers to python str
objects, so there won't be a limitation on the length of the strings