Search code examples
dataframerustmergerust-polars

How to merge two DataFrames with different columns / sizes


Looking for a way to combine two DataFrames.

df1:

shape: (2, 2)
┌────────┬──────────────────────┐
│ Fruit  ┆ Phosphorus (mg/100g) │
│ ---    ┆ ---                  │
│ str    ┆ i32                  │
╞════════╪══════════════════════╡
│ Apple  ┆ 11                   │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Banana ┆ 22                   │
└────────┴──────────────────────┘

df2:

shape: (1, 3)
┌──────┬─────────────────────┬──────────────────────┐
│ Name ┆ Potassium (mg/100g) ┆ Phosphorus (mg/100g) │
│ ---  ┆ ---                 ┆ ---                  │
│ str  ┆ i32                 ┆ i32                  │
╞══════╪═════════════════════╪══════════════════════╡
│ Pear ┆ 115                 ┆ 12                   │
└──────┴─────────────────────┴──────────────────────┘

Result should be:

shape: (3, 3)
+--------+----------------------+---------------------+
| Fruit  | Phosphorus (mg/100g) | Potassium (mg/100g) |
| ---    | ---                  | ---                 |
| str    | i32                  | i32                 |
+========+======================+=====================+
| Apple  | 11                   | null                |
+--------+----------------------+---------------------+
| Banana | 22                   | null                |
+--------+----------------------+---------------------+
| Pear   | 12                   | 115                 |
+--------+----------------------+---------------------+

Here is the code sniplet I try to make work:

use polars::prelude::*;

fn main() {
    let df1: DataFrame = df!("Fruit" => &["Apple", "Banana"],
                         "Phosphorus (mg/100g)" => &[11, 22])
    .unwrap();

    let df2: DataFrame = df!("Name" => &["Pear"],
                            "Potassium (mg/100g)" => &[115],
                            "Phosphorus (mg/100g)" => &[12]
    )
    .unwrap();

    let df3: DataFrame = df1
        .join(&df2, ["Fruit"], ["Name"], JoinType::Left, None)
        .unwrap();

    assert_eq!(df3.shape(), (3, 3));
    println!("{}", df3);
}

It's a FULL OUTER JOIN I am looking for ...

The ERROR I get:

thread 'main' panicked at 'assertion failed: (left == right) left: (2, 4), right: (3, 3)', src\main.rs:18:5


Solution

  • Thanks to @Ayaz :) I was able to make a generic version, one where I do not need to specify the shared column names each time.

    Here is my version of the FULL OUTER JOIN of two DataFrames:

    use polars::prelude::*;
    use array_tool::vec::{Intersect};
    
    fn concat_df(df1: &DataFrame, df2: &DataFrame) -> Result<DataFrame, PolarsError> {
        if df1.is_empty() {
            return Ok(df2.clone());
        }
    
        let df1_column_names = df1.get_column_names();
        let df2_column_names = df2.get_column_names();
    
        let common_column_names = &df1_column_names.intersect(df2_column_names)[..];
    
        df1.join(
            df2,
            common_column_names,
            common_column_names,
            JoinType::Outer,
            None,
        )
    }