I have a dataframe with transactions and another dataframe with employees who are assigned to those transactions. Each transaction can have 0...N assignments. I would like to have one dataframe which consists of transaction ids and all of the employee assignments in separate columns. Please see example below:
I have a one dataframe as follows:
TransactionIds | Other_Columns.. |
---|---|
T1 | Cell 2...These don't matter |
T2 | Cell 4...These don't matter |
T3 | Cell 4...These don't matter |
I have another dataframe as follows:
TransactionIds | Assignments |
---|---|
T1 | Assignment1 |
T1 | Assignment2 |
T1 | Assignment3 |
T2 | Assignment3 |
T2 | Assignment4 |
T3 | Assignment6 |
T4 | NULL |
I would like to have a dataframe which looks like as follows:
TransactionIds | Assignment1 | Assignment2 | Assignment3 | AssignmentN |
---|---|---|---|---|
T1 | Assignment1 | Assignment2 | Assignment3 | NULL |
T2 | Assignment3 | Assignment4 | NULL | NULL |
T3 | NULL | NULL | NULL | NULL |
I tried group by and then agg() function. However, it gives me a list which I don't know how to convert to columns. Another problem with this approach is I wouldn't know how many columns to convert this list into. I would like to dynamically determine/create the number of assignment columns from the SELECT.
I was able to resolve this by doing the following:
let stns: Vec<String> = (1..6).map(|i| format!("Station {i}")).collect();
let weather = df!(
"station"=> &stns,
"temperatures"=> &[
"20 5 5 E1 7 13 19 9 6 20",
"18 8 16 11 23 E2 8 E2 E2 E2 90 70 40",
"19 24 E9 16 6 12 10 22",
"E2 E0 15 7 8 10 E1 24 17 13 6",
"14 8 E0 16 22 24 E1",
],
)
.unwrap();
println!("{}", &weather);
let out = weather
.clone()
.lazy()
.with_columns([col("temperatures").str().split(lit(" "))])
.with_column(col("temperatures").list().lengths().alias("counts"))
.with_column(col("counts").max())
.collect()
.unwrap();
println!("{}", &out);
let max_is = out.column("counts").unwrap().u32().unwrap().max();
println!("max_is: {:?}", max_is);
let mut selects = vec![];
for i in 0..max_is.unwrap() {
selects.push(col("temperatures").list().get(lit(i)).alias(format!("Assignment_{}", i + 1).as_str()));
}
println!("Selects: {:#?}", selects);
let out12 = out.lazy().with_columns(&selects).drop_columns(["foo"]).collect().unwrap();
println!("{}", &out12);