Search code examples
pythonpython-polarsexploded

How to extract values based on column names and put it in another column in polars?


I would like to fill a value in a column based on another columns' name, in the Polars library from python (I obtained the following DF by exploding my variables' column names):

Input:

Name Average Median Q1 Variable
Apple 2 3 4 Average
Apple 2 3 4 Median
Apple 2 3 4 Q1
Banana 1 5 10 Average
Banana 1 5 10 Median
Banana 1 5 10 Q1

Expected output:

Name Average Median Q1 Variable Value
Apple 2 3 4 Average 2
Apple 2 3 4 Median 3
Apple 2 3 4 Q1 4
Banana 1 5 10 Average 1
Banana 1 5 10 Median 5
Banana 1 5 10 Q1 10

You can generate the input table here:

test = pl.DataFrame(
    {
        "Name": ["Apple","Apple","Apple","Banana","Banana","Banana"],
        "Average": [2,2,2,1,1,1],
        "Median": [3,3,3,5,5,5],
        "Q1": [4,4,4,10,10,10],
        "Variable":["Average","Median","Q1","Average","Median","Q1"]
    }
)

I have tried:

df = df.with_columns(value = pl.col(f"{pl.col.variable}"))

But that does not work because polars perceives the argument as a function (?). Does anyone know how to do this?

Note: I have also tried to transpose the dataframe, which, not only was that computationally expensive, also did not work! Because it would transpose the DF into a 5-rows-long DF. What I need is a (Name * Number of Variables)-rows-long DF.

That is, for example, I have 3 different names (say, Apple, Banana, and Dragonfruit), and I have 3 variables (Average, Median, Q1), then my DF should be 9-rows-long!


Solution

  • You can use

    • when/then() to check whether the value of the column Variable is the same as the column name.
    • coalesce() to choose first non-empty result.
    cols = [x for x in df.schema if x not in ("Name", "Variable")]
    
    df.with_columns(
        value = pl.coalesce(
            pl.when(pl.col.Variable == col).then(pl.col(col))
            for col in cols
        )
    )
    
    ┌────────┬─────────┬────────┬─────┬──────────┬───────┐
    │ Name   ┆ Average ┆ Median ┆ Q1  ┆ Variable ┆ value │
    │ ---    ┆ ---     ┆ ---    ┆ --- ┆ ---      ┆ ---   │
    │ str    ┆ i64     ┆ i64    ┆ i64 ┆ str      ┆ i64   │
    ╞════════╪═════════╪════════╪═════╪══════════╪═══════╡
    │ Apple  ┆ 2       ┆ 3      ┆ 4   ┆ Average  ┆ 2     │
    │ Apple  ┆ 2       ┆ 3      ┆ 4   ┆ Median   ┆ 3     │
    │ Apple  ┆ 2       ┆ 3      ┆ 4   ┆ Q1       ┆ 4     │
    │ Banana ┆ 3       ┆ 5      ┆ 10  ┆ Average  ┆ 3     │
    │ Banana ┆ 3       ┆ 5      ┆ 10  ┆ Median   ┆ 5     │
    │ Banana ┆ 3       ┆ 5      ┆ 10  ┆ Q1       ┆ 10    │
    └────────┴─────────┴────────┴─────┴──────────┴───────┘