pandas.DataFrame.to_markdown
transforms large int
to float
. Is it a bug or a feature? Are there any solutions?
>>> df = pd.DataFrame({"A": [123456, 123456]})
>>> print(df.to_markdown())
| | A |
|---:|-------:|
| 0 | 123456 |
| 1 | 123456 |
>>> df = pd.DataFrame({"A": [1234567, 1234567]})
>>> print(df.to_markdown())
| | A |
|---:|------------:|
| 0 | 1.23457e+06 |
| 1 | 1.23457e+06 |
>>> print(df)
A
0 1234567
1 1234567
>>> print(df.A.dtype)
int64
I initially found only a workaround, but not the explanation: converting the column to strings.
>>> df = pd.DataFrame({"A": [1234567, 1234567]})
>>> df["A"] = df.A.astype(str)
>>> print(df.to_markdown())
| | A |
|---:|--------:|
| 0 | 1234567 |
| 1 | 1234567 |
Update:
I think it is caused by 2 elements:
_column_type
function in tabulate
:def _column_type(strings, has_invisible=True, numparse=True):
"""The least generic type all column values are convertible to.
It can be solved by disabling the conversion via tablefmt="pretty"
:
print(df.to_markdown(tablefmt="pretty"))
+---+---------+
| | A |
+---+---------+
| 0 | 1234567 |
| 1 | 1234567 |
+---+---------+
float
numbers. Since tabulate
uses df.values
to extract the data, which transforms the DataFrame
to numpy.array
, all values are then converted to the same dtype
(float
). This is also discussed in this issue.>>> df = pd.DataFrame({"A": [1234567, 1234567], "B": [0.1, 0.2]})
>>> print(df)
A B
0 1234567 0.1
1 1234567 0.2
>>> print(df.A.dtype)
int64
>>> print(df.to_markdown(tablefmt="pretty"))
+---+-----------+-----+
| | A | B |
+---+-----------+-----+
| 0 | 1234567.0 | 0.1 |
| 1 | 1234567.0 | 0.2 |
+---+-----------+-----+
>>> df.values
array([[1.234567e+06, 1.000000e-01],
[1.234567e+06, 2.000000e-01]])