Search code examples
python-polars

Can I make a left join in Python Polars with a column that has null values?


I have data that is constantly changing and sometimes the column used to join two dataframes has only null values. Is there a way to do this in Polars? For instance, in Spark I used [EqSafeNull][1].

Example:

dataframe1 content

key value
a 1
b 2
c 3

dataframe2 content

key value
null 4
null 5
null 6
# This fails    
dataframe1.join(dataframe2, on="key", how="left")

Output:
*** exceptions.ComputeError: datatypes of join keys don't match

Solution

  • There are multiple ways to solve this and few people have put other valid solutions. This was my take:

    dataframe2 = dataframe2.with_columns(pl.col("key").fill_null(pl.lit("null")))
    

    It's probably not the nicest solution as there are chances to run into conflicts in the remote scenario that there's a real key call 'null'... but it gets me out of the error.

    I have also found this issue from 2022. There seems to be more a decision from developers rather than a missing feature. I like the reason why this isn't allowed but it is not convenient. Either way, there's a way to move forward.