Search code examples
pythonpython-polars

Concatenate/bind polars dataframes in a single dictionary with names


I have a dictionary created by importing data from multiple Excel sheets using polars.read_excel. I'd like to combine them all, by row, into a single dataframe and preserve the sheet name as a column in the new combined dataframe to record which sheet the observation came from.

I found a solution (Similar Example) that does almost what I'm looking for, but the solution explicitly calls the dataframes. I'm looking for a solution similar to the R Tidyverse solution. The Tidy solution to this type of concatenation would be really simple:

df <- list %>% 
    bind_rows(.id = "ID")

As I understand it, the most similar Polars function to bind_rows would be concat(how="vertical").

Is there something like this available in Polars? Something like the following perhaps:

df = dcty.polars.concat(how="vertical")

I've found that if I convert the dictionary to a list, the concat function works to combine the dataframes, but then I lose the dataframe names.

Sample Data

dcty = {
    "df1": pl.DataFrame({'col1': [1, 2], 'col2': ["a", "b"]}),
    "df2": pl.DataFrame({'col1': [3, 4], 'col2': ["c", "d"]}),
}

Expected Output:

shape: (4, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ sheet│
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ a    ┆ df1  │
│ 2    ┆ b    ┆ df1  │
│ 3    ┆ c    ┆ df2  │
│ 4    ┆ d    ┆ df2  │
└──────┴──────┴──────┘

Solution

  • Given the sample input, this generates the expected output:

    pl.concat([df.with_columns(sheet=pl.lit(name)) for name, df in dcty.items()])
    
    shape: (4, 3)
    ┌──────┬──────┬───────┐
    │ col1 ┆ col2 ┆ sheet │
    │ ---  ┆ ---  ┆ ---   │
    │ i64  ┆ str  ┆ str   │
    ╞══════╪══════╪═══════╡
    │ 1    ┆ a    ┆ df1   │
    │ 2    ┆ b    ┆ df1   │
    │ 3    ┆ c    ┆ df2   │
    │ 4    ┆ d    ┆ df2   │
    └──────┴──────┴───────┘
    

    Explanation:

    • .with_columns adds a new column to the original dataframes with the "sheet names" as literals.
    • pl.concat concatenates dataframes (vertically by default). See docs for details.