I want to read a csv file in which some column are string of this type: "[1.0, 0.0, -2.3]"
.
So I want to parse this strings as a list of float, to do so I use this code.
import polars as pl
df = pl.scan_csv(
"foo.csv",
).with_columns(
pl.col("^list.*$").str.strip("[ ]").str.split(",").cast(pl.List(pl.Float64)),
).collect()
Some of the strings are empty lists: "[]"
; and when I parse them I get a list like this one: [null]
.
I would like to remove all the rows in which there is a list which contains an empty element.
This is the code I tried to use
df.filter(
pl.all_horizontal(
~pl.col("^list.*$").list.contains(None)
)
)
But it gives me this error
InvalidOperationError: `is_in` operation not supported for dtype `list[f64]`
I provide a little example file to test the code yourself:
list1,list2,list3
"[]","[0.0, 1.2]","[3.4]"
"[102, 506]","[12, 5.2]","[2.3]"
"[15]","[0.5, 8.2]","[]"
In this example I would like only the second row to be kept in the DataFrame.
I think you want to replace (or replace_all) the spaces as your current code loses 506
df = pl.read_csv(b"""
list1,list2,list3
"[]","[0.0, 1.2]","[3.4]"
"[102, 506]","[12, 5.2]","[2.3]"
"[15]","[0.5, 8.2]","[]"
""")
If you filter first you can avoid the dtype issue:
(df.filter(pl.all_horizontal(pl.all() != "[]"))
.with_columns(
pl.all().str.replace(" ", "")
.str.strip_chars("[]")
.str.split(",")
.cast(pl.List(pl.Float64))
)
)
shape: (1, 3)
┌────────────────┬─────────────┬───────────┐
│ list1 ┆ list2 ┆ list3 │
│ --- ┆ --- ┆ --- │
│ list[f64] ┆ list[f64] ┆ list[f64] │
╞════════════════╪═════════════╪═══════════╡
│ [102.0, 506.0] ┆ [12.0, 5.2] ┆ [2.3] │
└────────────────┴─────────────┴───────────┘