Search code examples
python-polars

How to insert a polars.lit("some_string") at a specific index?


In polars

where dt is a dataFrame

dt.with_columns(
    new_column = pl.lit('some_text')
)

Will add a new column named new_column of some_text

And

dt.insert_at_idx(index:int, series:series)

Will insert a series at a given index

But is there a way to combine the two tasks i.e. add a pl.lit() at a specific index.

I tried

dt.insert_at_idx(index:int, pl.lit("some_text"))

but that did not work.

Does anyone know how to pl.lit() at a specified index?


Solution

  • I would do it by first creating a list of the existing columns, then add the new column as an expression to that list using .insert, and finally use select to get the columns in that order.

    One time method

    df = pl.DataFrame(data={'a':[1,2,3], 'b':[2,3,4], 'c':[3,4,5]})
    cols=df.columns
    cols.insert(2, pl.lit('some_text').alias("newcol"))
    df=df.select(cols)
    df
    shape: (3, 4)
    ┌─────┬─────┬───────────┬─────┐
    │ a   ┆ b   ┆ newcol    ┆ c   │
    │ --- ┆ --- ┆ ---       ┆ --- │
    │ i64 ┆ i64 ┆ str       ┆ i64 │
    ╞═════╪═════╪═══════════╪═════╡
    │ 1   ┆ 2   ┆ some_text ┆ 3   │
    │ 2   ┆ 3   ┆ some_text ┆ 4   │
    │ 3   ┆ 4   ┆ some_text ┆ 5   │
    └─────┴─────┴───────────┴─────┘
    

    I think the methods that modify the df in place might become deprecated at some point so it may be best to not use them or at least not count on them. I'm not even sure if there are others. You could shortcut the above with a custom function like:

    Functional method

    def with_at_idx(df, index, *args, **kwargs):
        if len(args)+len(kwargs)>1:
            raise ValueError("Only one new column allowed")
        # You could, of course, take out this error and the for 
        # loops will continue to work but then you need to deal 
        # with precedent between args and kwargs
        cols=df.columns
        for arg in args:
            cols.insert(index, arg)
        for colname, arg in kwargs.items():
            cols.insert(index, arg.alias(colname))
        return df.select(cols)
    pl.DataFrame.with_at_idx=with_at_idx
    

    Then you can do:

    df.with_at_idx(2, new_column=pl.lit(5))
    shape: (3, 4)
    ┌─────┬─────┬────────────┬─────┐
    │ a   ┆ b   ┆ new_column ┆ c   │
    │ --- ┆ --- ┆ ---        ┆ --- │
    │ i64 ┆ i64 ┆ i32        ┆ i64 │
    ╞═════╪═════╪════════════╪═════╡
    │ 1   ┆ 2   ┆ 5          ┆ 3   │
    │ 2   ┆ 3   ┆ 5          ┆ 4   │
    │ 3   ┆ 4   ┆ 5          ┆ 5   │
    └─────┴─────┴────────────┴─────┘
    

    OR

    df.with_at_idx(1, pl.lit("some_text").alias("newcol"))
    shape: (3, 4)
    ┌─────┬───────────┬─────┬─────┐
    │ a   ┆ newcol    ┆ b   ┆ c   │
    │ --- ┆ ---       ┆ --- ┆ --- │
    │ i64 ┆ str       ┆ i64 ┆ i64 │
    ╞═════╪═══════════╪═════╪═════╡
    │ 1   ┆ some_text ┆ 2   ┆ 3   │
    │ 2   ┆ some_text ┆ 3   ┆ 4   │
    │ 3   ┆ some_text ┆ 4   ┆ 5   │
    └─────┴───────────┴─────┴─────┘
    

    This does NOT modify the df in place so you'd need to do

    df=df.with_at_idx(...) to modify the underlying df.