I've started using polars recently (https://docs.pola.rs/api/python/stable/reference/index.html)
I have a column in my data frame that contains single element arrays (output of a keras model.predict):
X
object
[0.49981183]
[0.49974033]
[0.4997973]
[0.49973667]
[0.49978396]
I want to convert this into a column of floats:
0.49981183
0.49974033
0.4997973
0.49973667
0.49978396
I've tried:
data = data.with_columns((pl.col("X")[0]).alias("Y"))
but it gives me this error:
TypeError: 'Expr' object is not subscriptable
What's the right way to do this? There are around 67 million rows so the faster the better
Cheers
Unfortunately, columns of type Object
are often a dead-end. From the Data Types section of the Polars User Guide:
Object: A limited supported data type that can be any value.
Since support is limited, operations on columns of type Object
often throw exceptions.
However, there may be a way to retrieve the values in this particular situation. As an example, let's purposely create a column of type object
.
import polars as pl
data_as_list = [[0.49981183], [0.49974033],
[0.4997973], [0.49973667], [0.49978396]]
df = pl.DataFrame(
pl.Series("X", values=data_as_list, dtype=pl.Object)
)
print(df)
shape: (5, 1)
┌──────────────┐
│ X │
│ --- │
│ object │
╞══════════════╡
│ [0.49981183] │
│ [0.49974033] │
│ [0.4997973] │
│ [0.49973667] │
│ [0.49978396] │
└──────────────┘
This approach may work...
def attempt_recover(series: pl.Series) -> pl.Series:
return pl.Series(values=[val[0] for val in series])
df.with_columns(pl.col("X").map_batches(attempt_recover).alias("X_recovered"))
shape: (5, 2)
┌──────────────┬─────────────┐
│ X ┆ X_recovered │
│ --- ┆ --- │
│ object ┆ f64 │
╞══════════════╪═════════════╡
│ [0.49981183] ┆ 0.499812 │
│ [0.49974033] ┆ 0.49974 │
│ [0.4997973] ┆ 0.4997973 │
│ [0.49973667] ┆ 0.499737 │
│ [0.49978396] ┆ 0.499784 │
└──────────────┴─────────────┘
Try this first on a tiny subset of your data. This may not work. (And it will not be fast.)
What you'll want to do is alter the way that model prediction results from Keras are loaded into Polars to prevent getting a column of type Object
. (Often this means indexing an array/list output to extract the number from the array/list before loading into Polars.)