Search code examples
nullnonetypepython-polarspolarssl

Polars Dataframe change null to np.nan in Int row when use .to_numpy()


In polars, we can use .to_numpy() to change a polars.DataFrame into numpy.ndarray. But if there are None value, polars will change them into null, when use to_numpy(), null value will be change to np.NaN, thus change int array into float array.

import polars as pl

t = pl.DataFrame({
    'a': [1, 4, 3, 5],
    'b': [3, 6, 2, None],
})

print(t.to_numpy())

[[ 1. 3.] [ 4. 6.] [ 3. 2.] [ 5. nan]]

How can i avoid this, and change null to None when i want change DataFrame to ndarray?


Solution

  • numpy doesn't support null for float types, so you can't.

    If you really need None in a numpy array, you could cast to pl.Object first:

    In [42]: import polars as pl
        ...:
        ...: t = pl.DataFrame({
        ...:     'a': [1, 4, 3, 5],
        ...:     'b': [3, 6, 2, None],
        ...: }, schema_overrides={'b': pl.Object})
        ...:
        ...: print(t.to_numpy())
    [[1 3]
     [4 6]
     [3 2]
     [5 None]]
    

    But numpy doesn't really handle missing data, I'd suggest you first impute your missing data and then convert to numpy