Dataframe:
rng = np.random.default_rng(42)
df = pl.DataFrame(
{
"nrs": [1, 2, 3, None, 5],
"names": ["foo", "ham", "spam", "egg", None],
"random": rng.random(5),
"A": [True, True, False, False, False],
}
)
Currently, to add a constant to a column I do:
df = df.with_columns(pl.col('random') + 500.0)
Questions:
Why does df = df.with_columns(pl.col('random') += 500.0)
throw a SyntaxError
?
Various AIs tell me that df['random'] = df['random'] + 500
should also work, but it throws the following error instead:
TypeError: DataFrame object does not support `Series` assignment by index
Use `DataFrame.with_columns`.
Why is polars
throwing an error? I've been using df['random']
to identify the random
column in other parts of my code, and it worked.
They try to suggest you how it's done in pandas
, because of similar keywords like dataframe
and python
. But it just does not work the same way with polars
by design.
With +=
too, it's really just a matter of syntax. pl.col
is a class (of type <class 'polars.functions.col.ColumnFactory'>
), and instantiating that class creates an expression (<Expr ['col("random")'] at 0x701D91A7D850>
), but you cannot assign to that the same way as you cannot assign like this before a
exists:
a += 1
Or more precisely, because you cannot do the same within any function call:
>>> a = 10
>>> math.pow(a += 10, 2)
File "<stdin>", line 1
math.pow(a += 10, 2)
^^
SyntaxError: invalid syntax
>>> math.pow(a + 10, 2)
400.0
Interestingly, you can do this:
>>> expr = pl.col("random")
>>> expr += 500
>>> expr
<Expr ['[(col("random")) + (500)]'] at 0x701D91AE4800>
>>> df.select(expr)
shape: (5, 1)
┌────────────┐
│ random │
│ --- │
│ f64 │
╞════════════╡
│ 500.773956 │
│ 500.438878 │
│ 500.858598 │
│ 500.697368 │
│ 500.094177 │
└────────────┘
So this works
df = df.with_columns(pl.col('random') + 500.0)
because it's basically equivalent to the previous example, you just create the expression on the same line as you do the with_columns
.