Search code examples
pythondata-manipulationpython-polars

How to reshape/melt multiple columns in a dataset and split the resulting values into a different number of columns in Polars?


I'm trying to restructure a dataframe by combining values from multiple pairs of columns into a single column for each pair. For example, I've got a column called age_pre and a column called age_post, and I want to combine these two columns into one column called age. Same with the exposure_pre and exposure_post columns.

id age_pre age_post exposure_pre exposure_post ...
123 55 56 0.49 0.51 ...
123 56 57 0.49 0.51 ...
456 49 50 0.80 0.20 ...
456 50 51 0.80 0.20 ...

My desired output would look like this

id age exposure
123 55 0.49
123 56 0.51
123 56 0.49
123 57 0.51
456 49 0.8
456 50 0.2
456 50 0.8
456 51 0.2

I tried creating multiple melted dataframes and using .hstack() to create a dataframe with my desired output, put this seems really inefficient. How can I do this with the Polars syntax?


Solution

  • You can use pl.concat_list then .explode.

    out = df.with_columns(
        pl.concat_list(["age_pre", "age_post"]).alias("age"),
        pl.concat_list(["exposure_pre", "exposure_post"]).alias("exposure"),
    ).select(['id', 'age', 'exposure']).explode(['age', 'exposure'])
        
    print(out)
    shape: (8, 3)
    ┌─────┬─────┬──────────┐
    │ id  ┆ age ┆ exposure │
    │ --- ┆ --- ┆ ---      │
    │ i64 ┆ i64 ┆ f64      │
    ╞═════╪═════╪══════════╡
    │ 123 ┆ 55  ┆ 0.49     │
    │ 123 ┆ 56  ┆ 0.51     │
    │ 123 ┆ 56  ┆ 0.49     │
    │ 123 ┆ 57  ┆ 0.51     │
    │ 456 ┆ 49  ┆ 0.8      │
    │ 456 ┆ 50  ┆ 0.2      │
    │ 456 ┆ 50  ┆ 0.8      │
    │ 456 ┆ 51  ┆ 0.2      │
    └─────┴─────┴──────────┘