Search code examples
pythondataframepython-polars

How to add multiple DataFrames with different shapes in polars?


I would like to add multiple DataFrames with different shapes together.

Before adding the DataFrames, the idea would be to reshape them by adding the missing rows (using an "index" column as the reference) and the missing columns (filled with 0).

Here is an example of the inputs:

import polars as pl

a = pl.DataFrame(
    data={"index": [1, 2, 3], "col_1": [1, 0, 0], "col_2": [1, 1, 1]}
)

b = pl.DataFrame(
    data={"index": [1, 2, 3], "col_1": [1, 1, 1], "col_2": [1, 1, 1]}
)

c = pl.DataFrame(
    data={"index": [1, 4, 5], "col_1": [10, 10, 10], "col_3": [1, 1, 1]}
)

The expected result would be:

shape: (5, 4)
┌───────┬───────┬───────┬───────┐
│ index ┆ col_1 ┆ col_2 ┆ col_3 │
│ ---   ┆ ---   ┆ ---   ┆ ---   │
│ i64   ┆ i64   ┆ i64   ┆ i64   │
╞═══════╪═══════╪═══════╪═══════╡
│ 1     ┆ 12    ┆ 2     ┆ 1     │
│ 2     ┆ 1     ┆ 2     ┆ 0     │
│ 3     ┆ 1     ┆ 2     ┆ 0     │
│ 4     ┆ 10    ┆ 0     ┆ 1     │
│ 5     ┆ 10    ┆ 0     ┆ 1     │
└───────┴───────┴───────┴───────┘

The order of the columns is not a concern.

Here is a solution but it seems a little bit clunky:

from functools import reduce

columns = set()

for df in [a, b, c]:
    for column in df.columns:
        columns.add(column)

reshaped_df = []

for df in [a, b, c]:
    for column in columns:
        if column not in df.columns:
            df = df.with_columns(pl.lit(0).alias(column))
            reshaped_df.append(df)

reshaped_df = pl.align_frames(*reshaped_df, on="index", select=columns)

index = reshaped_df[0].select("index").to_series()

result = reduce(
    lambda a, b: a.select(pl.exclude("index").fill_null(value=0)) + b.select(pl.exclude("index").fill_null(value=0)),
    reshaped_df).hstack([index])

Solution

  • You could use the diagonal .concat() strategy.

    (pl.concat([a, b, c], how="diagonal")
       .group_by("index", maintain_order=True).sum()
    )
    
    shape: (5, 4)
    ┌───────┬───────┬───────┬───────┐
    │ index ┆ col_1 ┆ col_2 ┆ col_3 │
    │ ---   ┆ ---   ┆ ---   ┆ ---   │
    │ i64   ┆ i64   ┆ i64   ┆ i64   │
    ╞═══════╪═══════╪═══════╪═══════╡
    │ 1     ┆ 12    ┆ 2     ┆ 1     │
    │ 2     ┆ 1     ┆ 2     ┆ null  │
    │ 3     ┆ 1     ┆ 2     ┆ null  │
    │ 4     ┆ 10    ┆ null  ┆ 1     │
    │ 5     ┆ 10    ┆ null  ┆ 1     │
    └───────┴───────┴───────┴───────┘