Search code examples
pythonpython-polars

How to fill a column with random values in polars


I would like to know how to fill a column of a polars dataframe with random values. The idea is that I have a dataframe with a given number of columns, and I want to add a column to this dataframe which is filled with different random values (obtained from a random.random() function for example).

This is what I tried for now:

df = df.with_columns(
    pl.when((pl.col('Q') > 0)).then(random.random()).otherwise(pl.lit(1)).alias('Prob')
)

With this method, the result that I obtain is a column filled with one random value i.e. all the rows have the same value.

Is there a way to fill the column with different random values ?

Thanks by advance.


Solution

  • You need a "column" of random numbers the same height as your dataframe?

    np.random.rand could be useful for this:

    df = pl.DataFrame({"foo": [1, 2, 3]})
    
    df.with_columns(pl.lit(np.random.rand(df.height)).alias("prob"))
    
    shape: (3, 2)
    ┌─────┬──────────┐
    │ foo ┆ prob     │
    │ --- ┆ ---      │
    │ i64 ┆ f64      │
    ╞═════╪══════════╡
    │ 1   ┆ 0.657389 │
    │ 2   ┆ 0.616265 │
    │ 3   ┆ 0.142611 │
    └─────┴──────────┘
    
    df.with_columns(
       pl.when(pl.col("foo") > 2).then(pl.lit(np.random.rand(df.height)))
         .alias("prob")
    )
    
    shape: (3, 2)
    ┌─────┬──────────┐
    │ foo ┆ prob     │
    │ --- ┆ ---      │
    │ i64 ┆ f64      │
    ╞═════╪══════════╡
    │ 1   ┆ null     │
    │ 2   ┆ null     │
    │ 3   ┆ 0.686551 │
    └─────┴──────────┘
    

    It may also be possible to do similar with expressions?

    e.g. with .int_range() and .sample()

    df.with_columns(
       (pl.int_range(1000).sample(pl.len(), with_replacement=True) / 1000)
          .alias("prob")
    )
    
    shape: (3, 2)
    ┌─────┬───────┐
    │ foo ┆ prob  │
    │ --- ┆ ---   │
    │ i64 ┆ f64   │
    ╞═════╪═══════╡
    │ 1   ┆ 0.288 │
    │ 2   ┆ 0.962 │
    │ 3   ┆ 0.734 │
    └─────┴───────┘