Search code examples
python-polars

How to rename column on basis of condition in Polars python?


I am trying to rename column on basis of a condition in Polars python but getting errors.

Data:

import polars as pl

test_df = pl.DataFrame({'Id': [100118647578,
  100023274028,100023274028,100023274028,100118647578,
  100118647578,100118647578,100023274028,100023274028,
  100023274028,100118647578,100118647578,100023274028,
  100118647578,100118647578,100118647578,100118647578,
  100118647578,100118647578,100023274028,100118647578,
  100118647578,100118647578,100118647578,100023274028,
  100118647578,100118647578,100118647578,100023274028,
  100118647578,100118647578,100023274028],

 'Age': [49,22,25,18,41,45,42,30,28,
  20,44,56,26,53,40,35,29,
  8,55,23,54,36,52,33,29,
  10,34,39,27,51,19,31],

 'Status': [2,1,1,1,1,1,1,3,2,1,1,
  1,2,1,1,1,1,1,1,2,1,1,1,1,2,1,1,
  1,1,1,1,4]})

Below code is to filter the data on basis of value from argument and rename on same basis:

def Age_filter(status_filter_value = 1):
    return (
        test_df
        .filter(pl.col('Status') == status_filter_value)
        .sort(['Id','Age'])
        .groupby('Id')
        .agg( pl.col('Age').first())
        .sort('Id')

        # below part of code is giving error
        .rename({'Age' : pl.when(status_filter_value == 1)
                            .then('30_DPD_MOB')
                            .otherwise(pl.when(status_filter_value == 2)
                                       .then('60_DPD_MOB')
                                       .otherwise(pl.when(status_filter_value == 3)
                                                  .then('90_DPD_MOB')
                                                  .otherwise('120_DPD_MOB')
                                                  )
                                        )
                })
    )

Age_filter()

this gives an error: TypeError: argument 'new': 'Expr' object cannot be converted to 'PyString'

I have also tried below code but that is also not working:

def Age_filter1(status_filter_value = 1):
    {
    renamed_value = pl.when(status_filter_value == 1)
                            .then('30')
                            .otherwise(pl.when(status_filter_value == 2)
                                       .then('60')
                                       .otherwise(pl.when(status_filter_value == 3)
                                                  .then('90')
                                                  .otherwise('120')
                                                  )
                                        )


    return (
        test_df
        .filter(pl.col('Status') == status_filter_value)
        .sort(['Id','Age'])
        .groupby('Id')
        .agg( pl.col('Age').first())
        .sort('Id')
        .rename({'Age' : renamed_value
                })
    )
    }

Age_filter1()

Solution

  • As the error states, the rename method takes a dict of string to string only. No complicated expressions needed - in fact, pl.when, etc. should also be taking expressions, not a static int value.

    You can do something like this programmatically for your case:

    .rename({'Age' : f'{status_filter_value*30}_DPD_MOB')
    

    EDIT: Or, per below comments, directly in the agg:

    .agg(pl.col('Age').first().alias(f'{status_filter_value*30}_DPD_MOB'))