Search code examples
pythonpython-polars

How to assign column values based on another column iteratively with Polars


For these two dfs, I want to check for each i in df1["TS"] if df["TS"] == df1["TS}, then assign the value in "Dr" that corresponds to i to the "mmsi" column:

df = pl.DataFrame({"TS": [1, 2, 3, 4, 5, 6, 7], "mmsi":[11,12,13,14,15,16,17]})


df1 = pl.DataFrame({
"TS": [4, 6, 7], 
"Dr": [21,22,23]})  

I want the output of df["mmsi"] to be: [11,12,13,21,15,22,23]


Solution

  • I suggest using a "left" join, followed by a fill_null, to fill in values of Dr that are not found.

    df.join(
        df1,
        on="TS",
        how="left"
    ).with_columns(pl.col('Dr').fill_null(pl.col('mmsi')))
    
    shape: (7, 3)
    ┌─────┬──────┬─────┐
    │ TS  ┆ mmsi ┆ Dr  │
    │ --- ┆ ---  ┆ --- │
    │ i64 ┆ i64  ┆ i64 │
    ╞═════╪══════╪═════╡
    │ 1   ┆ 11   ┆ 11  │
    │ 2   ┆ 12   ┆ 12  │
    │ 3   ┆ 13   ┆ 13  │
    │ 4   ┆ 14   ┆ 21  │
    │ 5   ┆ 15   ┆ 15  │
    │ 6   ┆ 16   ┆ 22  │
    │ 7   ┆ 17   ┆ 23  │
    └─────┴──────┴─────┘
    

    Your result is in the Dr column. If needed, you can drop/rename columns so that mmsi is the final column.

    df = (
        df.join(df1, on="TS", how="left")
        .with_columns(pl.col("Dr").fill_null(pl.col("mmsi")))
        .drop("mmsi")
        .rename({"Dr": "mmsi"})
    )
    print(df)
    
    shape: (7, 2)
    ┌─────┬──────┐
    │ TS  ┆ mmsi │
    │ --- ┆ ---  │
    │ i64 ┆ i64  │
    ╞═════╪══════╡
    │ 1   ┆ 11   │
    │ 2   ┆ 12   │
    │ 3   ┆ 13   │
    │ 4   ┆ 21   │
    │ 5   ┆ 15   │
    │ 6   ┆ 22   │
    │ 7   ┆ 23   │
    └─────┴──────┘
    

    Taken in steps, the "left" join will yield the following.

    df.join(
        df1,
        on="TS",
        how="left"
    )
    
    shape: (7, 3)
    ┌─────┬──────┬──────┐
    │ TS  ┆ mmsi ┆ Dr   │
    │ --- ┆ ---  ┆ ---  │
    │ i64 ┆ i64  ┆ i64  │
    ╞═════╪══════╪══════╡
    │ 1   ┆ 11   ┆ null │
    │ 2   ┆ 12   ┆ null │
    │ 3   ┆ 13   ┆ null │
    │ 4   ┆ 21   ┆ 21   │
    │ 5   ┆ 15   ┆ null │
    │ 6   ┆ 22   ┆ 22   │
    │ 7   ┆ 23   ┆ 23   │
    └─────┴──────┴──────┘
    

    The fill_null step will then fill in any missing values in the Dr column using the corresponding values in the mmsi column.

    The performance of this will be much better than iterating over values using a for loop.