I have data that is constantly changing and sometimes the column used to join two dataframes has only null values. Is there a way to do this in Polars? For instance, in Spark I used [EqSafeNull][1].
Example:
dataframe1 content
key | value |
---|---|
a | 1 |
b | 2 |
c | 3 |
dataframe2 content
key | value |
---|---|
null | 4 |
null | 5 |
null | 6 |
# This fails
dataframe1.join(dataframe2, on="key", how="left")
Output:
*** exceptions.ComputeError: datatypes of join keys don't match
There are multiple ways to solve this and few people have put other valid solutions. This was my take:
dataframe2 = dataframe2.with_columns(pl.col("key").fill_null(pl.lit("null")))
It's probably not the nicest solution as there are chances to run into conflicts in the remote scenario that there's a real key call 'null'... but it gets me out of the error.
I have also found this issue from 2022. There seems to be more a decision from developers rather than a missing feature. I like the reason why this isn't allowed but it is not convenient. Either way, there's a way to move forward.