For these two dfs, I want to check for each i in df1["TS"] if df["TS"] == df1["TS}, then assign the value in "Dr" that corresponds to i to the "mmsi" column:
df = pl.DataFrame({"TS": [1, 2, 3, 4, 5, 6, 7], "mmsi":[11,12,13,14,15,16,17]})
df1 = pl.DataFrame({
"TS": [4, 6, 7],
"Dr": [21,22,23]})
I want the output of df["mmsi"] to be: [11,12,13,21,15,22,23]
I suggest using a "left" join
, followed by a fill_null
, to fill in values of Dr
that are not found.
df.join(
df1,
on="TS",
how="left"
).with_columns(pl.col('Dr').fill_null(pl.col('mmsi')))
shape: (7, 3)
┌─────┬──────┬─────┐
│ TS ┆ mmsi ┆ Dr │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪══════╪═════╡
│ 1 ┆ 11 ┆ 11 │
│ 2 ┆ 12 ┆ 12 │
│ 3 ┆ 13 ┆ 13 │
│ 4 ┆ 14 ┆ 21 │
│ 5 ┆ 15 ┆ 15 │
│ 6 ┆ 16 ┆ 22 │
│ 7 ┆ 17 ┆ 23 │
└─────┴──────┴─────┘
Your result is in the Dr
column. If needed, you can drop/rename columns so that mmsi
is the final column.
df = (
df.join(df1, on="TS", how="left")
.with_columns(pl.col("Dr").fill_null(pl.col("mmsi")))
.drop("mmsi")
.rename({"Dr": "mmsi"})
)
print(df)
shape: (7, 2)
┌─────┬──────┐
│ TS ┆ mmsi │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════╡
│ 1 ┆ 11 │
│ 2 ┆ 12 │
│ 3 ┆ 13 │
│ 4 ┆ 21 │
│ 5 ┆ 15 │
│ 6 ┆ 22 │
│ 7 ┆ 23 │
└─────┴──────┘
Taken in steps, the "left" join will yield the following.
df.join(
df1,
on="TS",
how="left"
)
shape: (7, 3)
┌─────┬──────┬──────┐
│ TS ┆ mmsi ┆ Dr │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪══════╪══════╡
│ 1 ┆ 11 ┆ null │
│ 2 ┆ 12 ┆ null │
│ 3 ┆ 13 ┆ null │
│ 4 ┆ 21 ┆ 21 │
│ 5 ┆ 15 ┆ null │
│ 6 ┆ 22 ┆ 22 │
│ 7 ┆ 23 ┆ 23 │
└─────┴──────┴──────┘
The fill_null
step will then fill in any missing values in the Dr
column using the corresponding values in the mmsi
column.
The performance of this will be much better than iterating over values using a for loop.