Hi I want to select those cols of a polars df that are of the dtype list.
Selecting by dtypes works ususally fine with df.select(pl.col(pl.Utf8))
.
However for the type list this does not seem to work...
import polars as pl
df = pl.DataFrame({"foo": [[c] for c in
["100CT pen", "pencils 250CT", "what 125CT soever", "this is a thing"]]}
)
df
Output:
foo
list[str]
["100CT pen"]
["pencils 250CT"]
["what 125CT soever"]
["this is a thing"]
df.select(pl.col(pl.List))
Output:
shape: (0, 0)
You need to provide the type of the items in the List
unlike primitive types (where print(df.select(pl.col(pl.Int64)))
would work in the below example).
import polars as pl
df = pl.DataFrame({
"foo": [[c] for c in
["100CT pen", "pencils 250CT", "what 125CT soever", "this is a thing"]],
"bar": [1, 2, 3, 4]
}
)
print(df.select(pl.col(pl.List(str))))
I can't seem to find anything that's generic across types that the List
contains. There is a NESTED_DTYPES
here and this answer suggests that you might be able to use it in a more "catch-all" manner, but it doesn't seem to work if you want to grab columns that contain a nested type regardless of the type of data it contains.
Thanks to @jqurious for pointing out that this seems to be a requested feature in an open ticket. This has an interesting use-case for me in that, the only reason I've switched dfs back to pandas
recently is that polars
refuses to write List
to CSV so I either filter out all such columns by name or, if this is implemented, I could drop them in one go. I didn't create those columns and I don't want them in the output.