Search code examples
rustrust-polars

Combine Two Polars Dataframe in a complex way


I have a dataframe with transactions and another dataframe with employees who are assigned to those transactions. Each transaction can have 0...N assignments. I would like to have one dataframe which consists of transaction ids and all of the employee assignments in separate columns. Please see example below:

I have a one dataframe as follows:

TransactionIds Other_Columns..
T1 Cell 2...These don't matter
T2 Cell 4...These don't matter
T3 Cell 4...These don't matter

I have another dataframe as follows:

TransactionIds Assignments
T1 Assignment1
T1 Assignment2
T1 Assignment3
T2 Assignment3
T2 Assignment4
T3 Assignment6
T4 NULL

I would like to have a dataframe which looks like as follows:

TransactionIds Assignment1 Assignment2 Assignment3 AssignmentN
T1 Assignment1 Assignment2 Assignment3 NULL
T2 Assignment3 Assignment4 NULL NULL
T3 NULL NULL NULL NULL

I tried group by and then agg() function. However, it gives me a list which I don't know how to convert to columns. Another problem with this approach is I wouldn't know how many columns to convert this list into. I would like to dynamically determine/create the number of assignment columns from the SELECT.


Solution

  • I was able to resolve this by doing the following:

        let stns: Vec<String> = (1..6).map(|i| format!("Station {i}")).collect();
        let weather = df!(
                "station"=> &stns,
                "temperatures"=> &[
                    "20 5 5 E1 7 13 19 9 6 20",
                    "18 8 16 11 23 E2 8 E2 E2 E2 90 70 40",
                    "19 24 E9 16 6 12 10 22",
                    "E2 E0 15 7 8 10 E1 24 17 13 6",
                    "14 8 E0 16 22 24 E1",
                ],
        )
        .unwrap();
        println!("{}", &weather);
    
        let out = weather
            .clone()
            .lazy()
            .with_columns([col("temperatures").str().split(lit(" "))])
            .with_column(col("temperatures").list().lengths().alias("counts"))
            .with_column(col("counts").max())
            .collect()
            .unwrap();
    
        println!("{}", &out);
    
        let max_is = out.column("counts").unwrap().u32().unwrap().max();
    
        println!("max_is: {:?}", max_is);
    
        let mut selects = vec![];
    
        for i in 0..max_is.unwrap() {
            selects.push(col("temperatures").list().get(lit(i)).alias(format!("Assignment_{}", i + 1).as_str()));
        }
    
        println!("Selects: {:#?}", selects);
    
        let out12 = out.lazy().with_columns(&selects).drop_columns(["foo"]).collect().unwrap();
        println!("{}", &out12);