I got a polars.DataFrame object data_frame
with mutlitple columns - strings and non-strings (like follows), an object where I want to cast all columns to strings:
import polars as pl
import polars.selectors as cs
data_frame = pl.DataFrame({'a': ['a', 'b', 'c'], 'b': range(3), 'c': [.1, .2, .3]})
non_string_columns = [col for col in data_frame.columns if data_frame[col].dtype != pl.Utf8]
for col in non_string_columns:
data_frame = data_frame.with_columns(pl.col(col).cast(pl.Utf8))
However this should be possible with the cs selector as well, something like:
data_frame.with_columns(~cs.string().as_expr().cast(pl.Utf8))
which does not cut it polars.exceptions.SchemaError: invalid series dtype: expected Boolean, got str
What is the the way to cast many columns at once into stirng (utilising the polars parallelism) with cs
selector?
The ~
is coming last in the order of operations, trying to negate a string expression instead of the selector. Force the right order with some extra parentheses:
data_frame.with_columns((~cs.string()).cast(pl.Utf8))
(No need for as_expr
here, either.)