Why does lambda work on new columns generated using pandas.Dataframe.assign in Python?

I often use pandas.DataFrame.assign() in order to method chain in Python.

When calculating values using existing columns, I never have to use lambda. But if I want to create a calculated column using a column I created within the same assign statement, I have to use lambda x. So the code below works, but I simply do not understand why lambda works in the code below.

Let's say I have an existing Dataframe with columns A, B, C. Using an assign statement, I want to change A by multiplying A and B. I also create a new column D, by multiplying B and C. Then I want to multipy C and D (this only works using lambda, why does lambda remember that I created column D but the normal df['D'] * df['C'] does not?

A	B	C
One	Two	Three

df = (df
      .assign(A = df['A'] * df['B'],
              D = df['B'] * df['C'],
              D = lambda x: x['D'] * x['C']))

Solution

Assigning multiple columns within the same assign is possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order.

Firstly it has to do with the order of execution.

With .assign(A = df['A'] * df['B'], the df['A'] is evaluated before df.assign executes.

df = pd.DataFrame({"A": [1], "B": [2], "C": [3]})
assign = df.assign

def debug_assign(**kwargs):
    print("Hello from: assign()")
    print(datetime.now())
    assign(**kwargs)

df.assign = debug_assign

>>> df.assign(D = new_value())
Hello from: new_value()
2023-02-14 16:08:38.424683
Hello from: assign()
2023-02-14 16:08:38.424722

As for a lambda - it is like a "mini-function", when you declare a lambda, it's like defining a function, nothing is actually executed.

>>> lambda x: x['D'] * x['C']
<function __main__.<lambda>(x)>

Meaning:

>>> df.assign(D = lambda x: x['D'] * x['C'])

Is similar to doing:

>>> def callback(): return x['D'] * x['C']
>>> df.assign(D = callback)

Functions can be assigned to variables and passed as arguments.

>>> my_other_print = print
>>> my_other_print
<function print>

They're not executed/called until () is used - (notice there is no () in D = callback)

>>> my_other_print("hello")
hello

pandas checks if something is a "callable" - if it is, it is run against the current "state", that is, any previous assign arguments that have been computed are included.