I have a dictionary created by importing data from multiple Excel sheets using polars.read_excel
. I'd like to combine them all, by row, into a single dataframe and preserve the sheet name as a column in the new combined dataframe to record which sheet the observation came from.
I found a solution (Similar Example) that does almost what I'm looking for, but the solution explicitly calls the dataframes. I'm looking for a solution similar to the R Tidyverse solution. The Tidy solution to this type of concatenation would be really simple:
df <- list %>%
bind_rows(.id = "ID")
As I understand it, the most similar Polars function to bind_rows
would be concat(how="vertical")
.
Is there something like this available in Polars? Something like the following perhaps:
df = dcty.polars.concat(how="vertical")
I've found that if I convert the dictionary to a list, the concat
function works to combine the dataframes, but then I lose the dataframe names.
Sample Data
dcty = {
"df1": pl.DataFrame({'col1': [1, 2], 'col2': ["a", "b"]}),
"df2": pl.DataFrame({'col1': [3, 4], 'col2': ["c", "d"]}),
}
Expected Output:
shape: (4, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ sheet│
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ a ┆ df1 │
│ 2 ┆ b ┆ df1 │
│ 3 ┆ c ┆ df2 │
│ 4 ┆ d ┆ df2 │
└──────┴──────┴──────┘
Given the sample input, this generates the expected output:
pl.concat([df.with_columns(sheet=pl.lit(name)) for name, df in dcty.items()])
shape: (4, 3)
┌──────┬──────┬───────┐
│ col1 ┆ col2 ┆ sheet │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞══════╪══════╪═══════╡
│ 1 ┆ a ┆ df1 │
│ 2 ┆ b ┆ df1 │
│ 3 ┆ c ┆ df2 │
│ 4 ┆ d ┆ df2 │
└──────┴──────┴───────┘
Explanation:
.with_columns
adds a new column to the original dataframes with the "sheet names" as literals.pl.concat
concatenates dataframes (vertically by default). See docs for details.