How to ensure that polars creates a column of type list rather than type object

The code below will create a column called paid that looks like a list, but is an object, and thus practically useless as a column. How can I ensure that the created column is a list column rather than an object column since .cast() cannot be applied to the object column after is has been created.

import numpy as np
import polars as pl
import scipy.stats as stats

CLUSTERS = 200 
MEAN_TRIALS = 20
MU = 0.5
SIGMA = 0.1

df_cluster = pl.DataFrame({'cluster_id': range(1, CLUSTERS+1)}) 

df_cluster = df_cluster.with_columns(
    mu = stats.truncnorm(a=0, b=1, loc=MU, scale=SIGMA).rvs(size=CLUSTERS),
    trials = np.random.poisson(lam=MEAN_TRIALS, size=CLUSTERS)
)

df_cluster = df_cluster.with_columns(
    pl.struct(["mu", "trials"])
    .map_elements(lambda x: np.random.binomial(n=1, p=x['mu'], size=x['trials']))
    .alias('paid')
)

df_cluster.head()

Solution

you can create list() out of ndarray before returning it:

...
df_cluster = df_cluster.with_columns(
    pl.struct(["mu", "trials"])
    .map_elements(
        lambda x: list(np.random.binomial(n=1, p=x['mu'], size=x['trials']))
    )
    .alias('paid')
)

df_cluster.head()

┌────────────┬──────────┬────────┬─────────────┐
│ cluster_id ┆ mu       ┆ trials ┆ paid        │
│ ---        ┆ ---      ┆ ---    ┆ ---         │
│ i64        ┆ f64      ┆ i32    ┆ list[i32]   │
╞════════════╪══════════╪════════╪═════════════╡
│ 1          ┆ 0.508726 ┆ 25     ┆ [1, 0, … 1] │
│ 2          ┆ 0.513275 ┆ 26     ┆ [1, 1, … 0] │
│ 3          ┆ 0.57244  ┆ 22     ┆ [1, 0, … 1] │
│ 4          ┆ 0.556384 ┆ 15     ┆ [0, 0, … 0] │
│ 5          ┆ 0.51955  ┆ 15     ┆ [1, 1, … 1] │
└────────────┴──────────┴────────┴─────────────┘