Search code examples
rustinner-joincategorical-datarust-polars

Lazy join multiple DataFrames on a Categorical


trying to implement the SAMPLE of Lazy join multiple DataFrames on a Categorical:

use polars::prelude::*;

fn lazy_example(mut df_a: LazyFrame, mut df_b: LazyFrame) -> Result<DataFrame> {

    let q1 = df_a.with_columns(vec![
        col("a").cast(DataType::Categorical),
    ]);

    let q2 = df_b.with_columns(vec![
        col("b").cast(DataType::Categorical)
    ]);
    q1.inner_join(q2, col("a"), col("b"), None).collect()
}

getting an error:

error[E0308]: mismatched types
   --> src\main.rs:6:23
    |
6   |         col("a").cast(DataType::Categorical),
    |                  ---- ^^^^^^^^^^^^^^^^^^^^^ expected enum `polars::prelude::DataType`, found fn item
    |                  |
    |                  arguments to this function are incorrect
    |
    = note: expected enum `polars::prelude::DataType`
            found fn item `fn(Option<Arc<RevMapping>>) -> polars::prelude::DataType {polars::prelude::DataType::Categorical}`
note: associated function defined here
   --> C:\Users\rnio\.cargo\registry\src\github.com-1ecc6299db9ec823\polars-lazy-0.23.1\src\dsl\mod.rs:555:12
    |
555 |     pub fn cast(self, data_type: DataType) -> Self {
    |            ^^^^
help: use parentheses to instantiate this tuple variant
    |
6   |         col("a").cast(DataType::Categorical(_)),
    |                                            +++


applied the suggested fix:

col("a").cast(DataType::Categorical()),
col("b").cast(DataType::Categorical()),

get next error:

error[E0061]: this enum variant takes 1 argument but 0 arguments were supplied
   --> src\main.rs:7:23
    |
7   |         col("a").cast(DataType::Categorical()),
    |                       ^^^^^^^^^^^^^^^^^^^^^-- an argument of type `Option<Arc<RevMapping>>` is missing
    |
note: tuple variant defined here
   --> C:\Users\rnio\.cargo\registry\src\github.com-1ecc6299db9ec823\polars-core-0.23.1\src\datatypes\mod.rs:707:5
    |
707 |     Categorical(Option<Arc<RevMapping>>),
    |     ^^^^^^^^^^^
help: provide the argument
    |
7   |         col("a").cast(DataType::Categorical(/* Option<Arc<RevMapping>> */)),
    |                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

So its missing an argument for Categorial() ... even though it will not be used:

// The RevMapping has the internal state. This is ignored with casts, comparisons, hashing etc.

https://docs.rs/polars/latest/polars/datatypes/enum.RevMapping.html

Any idea how to fix this?

Thanks


Solution

  • Thanks to @Dogbert :)

    here is the working code:

    fn lazy_example(mut df_a: LazyFrame, mut df_b: LazyFrame) -> Result<DataFrame> {
    
        let q1 = df_a.with_columns(vec![
            col("a").cast(DataType::Categorical(None)),
        ]);
    
        let q2 = df_b.with_columns(vec![
            col("b").cast(DataType::Categorical(None))
        ]);
        q1.inner_join(q2, col("a"), col("b")).collect()
    }