In polars
where dt
is a dataFrame
dt.with_columns(
new_column = pl.lit('some_text')
)
Will add a new column named new_column
of some_text
And
dt.insert_at_idx(index:int, series:series)
Will insert a series
at a given index
But is there a way to combine the two tasks i.e. add a pl.lit()
at a specific index.
I tried
dt.insert_at_idx(index:int, pl.lit("some_text"))
but that did not work.
Does anyone know how to pl.lit()
at a specified index?
I would do it by first creating a list of the existing columns, then add the new column as an expression to that list using .insert
, and finally use select
to get the columns in that order.
df = pl.DataFrame(data={'a':[1,2,3], 'b':[2,3,4], 'c':[3,4,5]})
cols=df.columns
cols.insert(2, pl.lit('some_text').alias("newcol"))
df=df.select(cols)
df
shape: (3, 4)
┌─────┬─────┬───────────┬─────┐
│ a ┆ b ┆ newcol ┆ c │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 │
╞═════╪═════╪═══════════╪═════╡
│ 1 ┆ 2 ┆ some_text ┆ 3 │
│ 2 ┆ 3 ┆ some_text ┆ 4 │
│ 3 ┆ 4 ┆ some_text ┆ 5 │
└─────┴─────┴───────────┴─────┘
I think the methods that modify the df in place might become deprecated at some point so it may be best to not use them or at least not count on them. I'm not even sure if there are others. You could shortcut the above with a custom function like:
def with_at_idx(df, index, *args, **kwargs):
if len(args)+len(kwargs)>1:
raise ValueError("Only one new column allowed")
# You could, of course, take out this error and the for
# loops will continue to work but then you need to deal
# with precedent between args and kwargs
cols=df.columns
for arg in args:
cols.insert(index, arg)
for colname, arg in kwargs.items():
cols.insert(index, arg.alias(colname))
return df.select(cols)
pl.DataFrame.with_at_idx=with_at_idx
Then you can do:
df.with_at_idx(2, new_column=pl.lit(5))
shape: (3, 4)
┌─────┬─────┬────────────┬─────┐
│ a ┆ b ┆ new_column ┆ c │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i32 ┆ i64 │
╞═════╪═════╪════════════╪═════╡
│ 1 ┆ 2 ┆ 5 ┆ 3 │
│ 2 ┆ 3 ┆ 5 ┆ 4 │
│ 3 ┆ 4 ┆ 5 ┆ 5 │
└─────┴─────┴────────────┴─────┘
OR
df.with_at_idx(1, pl.lit("some_text").alias("newcol"))
shape: (3, 4)
┌─────┬───────────┬─────┬─────┐
│ a ┆ newcol ┆ b ┆ c │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ i64 │
╞═════╪═══════════╪═════╪═════╡
│ 1 ┆ some_text ┆ 2 ┆ 3 │
│ 2 ┆ some_text ┆ 3 ┆ 4 │
│ 3 ┆ some_text ┆ 4 ┆ 5 │
└─────┴───────────┴─────┴─────┘
This does NOT modify the df in place so you'd need to do
df=df.with_at_idx(...)
to modify the underlying df.