I would like to know how to fill a column of a polars dataframe with random values. The idea is that I have a dataframe with a given number of columns, and I want to add a column to this dataframe which is filled with different random values (obtained from a random.random() function for example).
This is what I tried for now:
df = df.with_columns(
pl.when((pl.col('Q') > 0)).then(random.random()).otherwise(pl.lit(1)).alias('Prob')
)
With this method, the result that I obtain is a column filled with one random value i.e. all the rows have the same value.
Is there a way to fill the column with different random values ?
Thanks by advance.
You need a "column" of random numbers the same height as your dataframe?
np.random.rand
could be useful for this:
df = pl.DataFrame({"foo": [1, 2, 3]})
df.with_columns(pl.lit(np.random.rand(df.height)).alias("prob"))
shape: (3, 2)
┌─────┬──────────┐
│ foo ┆ prob │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪══════════╡
│ 1 ┆ 0.657389 │
│ 2 ┆ 0.616265 │
│ 3 ┆ 0.142611 │
└─────┴──────────┘
df.with_columns(
pl.when(pl.col("foo") > 2).then(pl.lit(np.random.rand(df.height)))
.alias("prob")
)
shape: (3, 2)
┌─────┬──────────┐
│ foo ┆ prob │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪══════════╡
│ 1 ┆ null │
│ 2 ┆ null │
│ 3 ┆ 0.686551 │
└─────┴──────────┘
It may also be possible to do similar with expressions?
e.g. with .int_range()
and .sample()
df.with_columns(
(pl.int_range(1000).sample(pl.len(), with_replacement=True) / 1000)
.alias("prob")
)
shape: (3, 2)
┌─────┬───────┐
│ foo ┆ prob │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═══════╡
│ 1 ┆ 0.288 │
│ 2 ┆ 0.962 │
│ 3 ┆ 0.734 │
└─────┴───────┘