Search code examples
pythonpandasdataframelambdadictionary-comprehension

Is there a way to pass a lambda function to reference the dataframe in a dictionary comprehension?


I'm trying to emulate the behavior of R's mutate(across()) with Pandas, and a one-liner way of doing it is through dictionary comprehensions. For example, if I have a DataFrame that looks like this:

df.head()

>>>            jam_cpi_eop  bah_cpi_eop
>>> 1980-01-01       3.038       38.714
>>> 1981-01-01       3.183       42.193
>>> 1982-01-01       3.406       44.115
>>> 1983-01-01       3.974       45.653
>>> 1984-01-01       5.212       47.748

And I want to take the np.log() of jam_cpi_eop and bah_cpi_eop columns of my DataFrame in only one go, I can write the following code:

df1 = df.assign(
        **{f'l{col.name}': np.log(col) for col in [df.jam_cpi_eop, df.bah_cpi_eop]}
    )

df1.head()
>>>             jam_cpi_eop    bah_cpi_eop  ljam_cpi_eop    lbah_cpi_eop
>>> 1980-01-01        3.038         38.714      1.111199        3.656201
>>> 1981-01-01        3.183         42.193      1.157824        3.742254
>>> 1982-01-01        3.406         44.115      1.225539        3.786800
>>> 1983-01-01        3.974         45.653      1.379773        3.821069
>>> 1984-01-01        5.212         47.748      1.650964        3.865937

The problem is that if I need to make more operations on the newly created columns ljam_cpi_eop and lbah_cpi_eop, I can't use df as a reference anymore. Is there a way to pass a lambda function to the list inside the dictionary comprehension?

I would expect something like the following lines, but it doesn't work.

### DOESN'T WORK!
df1 = df.assign(
        **{f'l{col.name}': np.log(col) for col in [lambda x: x.jam_cpi_eop, x.bah_cpi_eop]})

EDIT:

What I need is to take the differences of the logs in a second step. For example, the following code works:

df1 = df.assign(
        **{f'd{col.name}': col - col.shift(1) for col in [df.jam_cpi_eop, df.bah_cpi_eop]})

But I want to take the difference of the log variables that I created in the first step, and not of the original columns.

I know that I can do it by assigning an intermediate dataframe, but I'm interested in knowing if there's a way to do it without creating an intermediate DataFrame.


Solution

  • Is there a way to pass a lambda function to the list inside the dictionary comprehension?

    One way to achieve this is using DataFrame.pipe

    df = (df
     .pipe(lambda x: x.assign(**{f'l{col.name}': np.log(col) for col in [x.jam_cpi_eop, x.bah_cpi_eop]}))
     .pipe(lambda x: x.assign(**{f'l{col.name}': col - col.shift(1) for col in [x.ljam_cpi_eop, x.lbah_cpi_eop]}))
    )
    

    Result:

                  jam_cpi_eop  bah_cpi_eop  ljam_cpi_eop  lbah_cpi_eop  lljam_cpi_eop  llbah_cpi_eop
    1980-01-01        3.038       38.714      1.111199      3.656201            NaN            NaN
    1981-01-01        3.183       42.193      1.157824      3.742254       0.046625       0.086053
    1982-01-01        3.406       44.115      1.225539      3.786800       0.067714       0.044546
    1983-01-01        3.974       45.653      1.379773      3.821069       0.154235       0.034269
    1984-01-01        5.212       47.748      1.650964      3.865937       0.271191       0.044868