Search code examples
pythonlisttranspose

python-polars transpose on the column of lists


I have a data frame with an index column and column with a list of values (lists could be different length):

df2 = pl.DataFrame({'x': [1, 2, 3], 'y': [['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['h', 'i', 'j']]})

shape: (3, 2)
┌─────┬──────────────────┐
│ x   ┆ y                │
│ --- ┆ ---              │
│ i64 ┆ list\[str\]      │
╞═════╪════════════ ═════╡
│ 1   ┆ ["a", "b", "c"]  │
│ 2   ┆ ["d", "e", … "g"]│
│ 3   ┆ ["h", "i", "j"]  │
└─────┴──────────────────┘


I'm trying to transpose the list, convert it into a series and retain the index so the resulting data frame would look like:

┌─────┬─────┐
│ x   ┆ yp  │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 1   ┆ "a" │
| 1   ┆ "b" |
| 1   ┆ "c" |
| 2   ┆ "d" |
| 2   ┆ "e" |
| 2   ┆ "f" |
| 2   ┆ "g" |
│ 3   ┆ "h" │
|...  ┆...  |
└─────┴─────┘

I could probably iterate through the data frame but I don't think this would be the most optimal way to do this. Any help would be appreciated.


Solution

  • import polars as pl
    
    df2 = pl.DataFrame({'x': [1, 2, 3], 'y': [['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['h', 'i', 'j']]})
    
    # Unnest the 'y' column and repeat 'x' values
    df_unnested = df2.explode('y')
    
    # Print the resulting DataFrame
    print(df_unnested)