Search code examples
python-polars

Convert all non-string columns to string


I got a polars.DataFrame object data_frame with mutlitple columns - strings and non-strings (like follows), an object where I want to cast all columns to strings:

import polars as pl
import polars.selectors as cs
data_frame = pl.DataFrame({'a': ['a', 'b', 'c'], 'b': range(3), 'c': [.1, .2, .3]})

non_string_columns = [col for col in data_frame.columns if data_frame[col].dtype != pl.Utf8]
for col in non_string_columns:
    data_frame = data_frame.with_columns(pl.col(col).cast(pl.Utf8))

However this should be possible with the cs selector as well, something like:

data_frame.with_columns(~cs.string().as_expr().cast(pl.Utf8))

which does not cut it polars.exceptions.SchemaError: invalid series dtype: expected Boolean, got str

What is the the way to cast many columns at once into stirng (utilising the polars parallelism) with cs selector?


Solution

  • The ~ is coming last in the order of operations, trying to negate a string expression instead of the selector. Force the right order with some extra parentheses:

    data_frame.with_columns((~cs.string()).cast(pl.Utf8))
    

    (No need for as_expr here, either.)