Search code examples
pythontype-conversionpython-polars

Polars cast pl.Object to pl.String: polars.exceptions.ComputeError: cannot cast 'Object' type


Update: numpy.random.choice is no longer parsed as an Object type. The example produces a String column as expected without any casting needed.


I got a pl.LazyFrame with a column of type Object that contains date representations, it also includes missing values (None).
In a first step I would like to convert the column from Object to String however this results in a ComputeError. I can not seem to figure out why. I suppose this is due to the None values, sadly I can not drop those at the current point in time.

import numpy as np
import polars as pl

rng = np.random.default_rng(12345)
df = pl.LazyFrame(
    data={
        "date": rng.choice(
            [None, "03.04.1998", "03.05.1834", "05.06.2025"], 100
        ),
    }
)
df.with_columns(pl.col("date").cast(pl.String)).collect()

Solution

  • When Polars assigns the pl.Object type it essentially means: "I do not understand what this is."

    By the time you end up with this type, it is generally too late to do anything useful with it.

    In this particular case, numpy.random.choice is creating a numpy array of dtype=object

    >>> rng.choice([None, "foo"], 3)
    array([None, None, 'foo'], dtype=object)
    

    Polars has native .sample() functionality which you could use to create your data instead.

    df = pl.select(date = 
        pl.Series([None, "03.04.1998", "03.05.1834", "05.06.2025"])
          .sample(100, with_replacement=True)
    )
    
    # shape: (100, 1)
    # ┌────────────┐
    # │ date       │
    # │ ---        │
    # │ str        │
    # ╞════════════╡
    # │ null       │
    # │ 05.06.2025 │
    # │ 03.05.1834 │
    # │ 03.04.1998 │
    # │ …          │
    # │ null       │
    # │ 03.04.1998 │
    # │ 03.05.1834 │
    # │ 03.04.1998 │
    # └────────────┘