I have a polars dataframe like so:
df = pl.from_repr("""
┌─────────────────────┬─────────┬─────────┐
│ time ┆ 1 ┆ 2 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ f64 │
╞═════════════════════╪═════════╪═════════╡
│ 2021-10-02 00:05:00 ┆ 2.9048 ┆ 2.8849 │
│ 2021-10-02 00:10:00 ┆ 48224.0 ┆ 48068.0 │
└─────────────────────┴─────────┴─────────┘
""")
and a masking dataframe with similar columns and time value like so:
df_mask = pl.from_repr("""
┌─────────────────────┬───────┬───────┐
│ time ┆ 1 ┆ 2 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ bool ┆ bool │
╞═════════════════════╪═══════╪═══════╡
│ 2021-10-02 00:05:00 ┆ false ┆ false │
│ 2021-10-02 00:10:00 ┆ true ┆ true │
└─────────────────────┴───────┴───────┘
""")
I am looking for this result:
shape: (2, 3)
┌─────────────────────┬────────┬─────────┐
│ time ┆ 1 ┆ 2 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ f64 │
╞═════════════════════╪════════╪═════════╡
│ 2021-10-02 00:05:00 ┆ null ┆ null │
│ 2021-10-02 00:10:00 ┆ 2.8849 ┆ 48068.0 │
└─────────────────────┴────────┴─────────┘
Here I only show with 2 columns '1' and '2' but there could any number of them.
Any help would be appreciated!
Having columns in a single DataFrame
has guarantees you don't have when you have data in separate tables.
Masking out values by columns in another DataFrame
is a potential for errors caused by different lengths. For this reason polars does not encourage such operations and therefore you must first create a single DataFrame
from the two, and then select
the columns/computations you need.
So let's do that.
The first thing you can do is join
the two tables. This guaranteed to work on DataFrame
s of different sizes.
df_a.join(df_mask, on="time", suffix="_mask")
This comes however with a price, as joining is not free.
If you are 100% certain your dataframes have the same height, you can use a horizontal concat
.
(
pl.concat(
[df_a, df_mask.select(pl.all().name.suffix("_mask"))], how="horizontal"
).select(
[pl.col("time")]
+ [
pl.when(pl.col(f"{name}_mask")).then(pl.col(name)).otherwise(None)
for name in ["1", "2"]
]
)
)
In the final select
query we take the columns we want. And compute the masked values with a when -> then -> otherwise
branch.
This outputs:
shape: (2, 3)
┌─────────────────────┬─────────┬─────────┐
│ time ┆ 1 ┆ 2 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ f64 │
╞═════════════════════╪═════════╪═════════╡
│ 2021-10-02 00:05:00 ┆ null ┆ null │
│ 2021-10-02 00:10:00 ┆ 48224.0 ┆ 48068.0 │
└─────────────────────┴─────────┴─────────┘