In polars, we can use .to_numpy() to change a polars.DataFrame into numpy.ndarray. But if there are None value, polars will change them into null, when use to_numpy(), null value will be change to np.NaN, thus change int array into float array.
import polars as pl
t = pl.DataFrame({
'a': [1, 4, 3, 5],
'b': [3, 6, 2, None],
})
print(t.to_numpy())
[[ 1. 3.] [ 4. 6.] [ 3. 2.] [ 5. nan]]
How can i avoid this, and change null to None when i want change DataFrame to ndarray?
numpy doesn't support null
for float
types, so you can't.
If you really need None
in a numpy array, you could cast to pl.Object
first:
In [42]: import polars as pl
...:
...: t = pl.DataFrame({
...: 'a': [1, 4, 3, 5],
...: 'b': [3, 6, 2, None],
...: }, schema_overrides={'b': pl.Object})
...:
...: print(t.to_numpy())
[[1 3]
[4 6]
[3 2]
[5 None]]
But numpy doesn't really handle missing data, I'd suggest you first impute your missing data and then convert to numpy