Search code examples
pythonpython-polars

Add a constant to an existing column


Dataframe:

rng = np.random.default_rng(42)

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names": ["foo", "ham", "spam", "egg", None],
        "random": rng.random(5),
        "A": [True, True, False, False, False],
    }
)

Currently, to add a constant to a column I do:

df = df.with_columns(pl.col('random') + 500.0)

Questions:

  1. Why does df = df.with_columns(pl.col('random') += 500.0) throw a SyntaxError?

  2. Various AIs tell me that df['random'] = df['random'] + 500 should also work, but it throws the following error instead:

    TypeError: DataFrame object does not support `Series` assignment by index
    
    Use `DataFrame.with_columns`.
    

    Why is polars throwing an error? I've been using df['random'] to identify the random column in other parts of my code, and it worked.


Solution

  • AI:s tell you to do so, because they are not actually intelligent

    They try to suggest you how it's done in pandas, because of similar keywords like dataframe and python. But it just does not work the same way with polars by design.

    Augmented assignment problem

    With += too, it's really just a matter of syntax. pl.col is a class (of type <class 'polars.functions.col.ColumnFactory'>), and instantiating that class creates an expression (<Expr ['col("random")'] at 0x701D91A7D850>), but you cannot assign to that the same way as you cannot assign like this before a exists:

    a += 1
    

    Or more precisely, because you cannot do the same within any function call:

    >>> a = 10
    >>> math.pow(a += 10, 2)
      File "<stdin>", line 1
        math.pow(a += 10, 2)
                   ^^
    SyntaxError: invalid syntax
    >>> math.pow(a + 10, 2)
    400.0
    

    Interestingly, you can do this:

    >>> expr = pl.col("random")
    >>> expr += 500
    >>> expr
    <Expr ['[(col("random")) + (500)]'] at 0x701D91AE4800>
    >>> df.select(expr)
    shape: (5, 1)
    ┌────────────┐
    │ random     │
    │ ---        │
    │ f64        │
    ╞════════════╡
    │ 500.773956 │
    │ 500.438878 │
    │ 500.858598 │
    │ 500.697368 │
    │ 500.094177 │
    └────────────┘
    

    So this works

    df = df.with_columns(pl.col('random') + 500.0)
    

    because it's basically equivalent to the previous example, you just create the expression on the same line as you do the with_columns.