How can I create a new column in a pandas dataframe using a f-string template with unknown column names?

I want to write a script/function to process a fairly generic pandas dataframe. The data frame is the result of a pd.merge() operation between two data frames, one of which is supplied by the user in the form of a CSV file with arbitrary columns. I know all the values are text but that's about it.

I want to create a new column in the dataframe based on the values of a combination of other columns. This is a fairly easy problem when the column names are known, but in this case I know neither the names of the columns nor how they are to be combined in advance.

Suppose the dataframe df looks like this:

print(df)
     col1 col2
row1 abc  def
row2 ghi  jkl

And suppose also that the desired manipulation is "create a new column by adding suffix _x to the values in col2". It would be nice if the use could express a template for this manipulation using f-strings. In this case, the template might be ..

template = 'f"{col2}_x"'

The double-quoting is is intentional -- I want to delay the evaluation of the template until it is applied to the dataframe. Note that the user will know the names of at least some columns (the ones that they supplied via the CSV file) and so they can specify a template based on these column names.

I was hoping that I could simply use pd.eval() or more specifically df.eval() which evaluates an expression using the namespace provided by the column names.

Something like

df["new_col"] = df.eval(template)

But this returns ...

AttributeError: 'PythonExprVisitor' object has no attribute 'visit_JoinedStr'

.. and I gather that pd.eval() does not support f-strings as of March 2023: https://github.com/pandas-dev/pandas/issues/52310

I guess I could export the dataframe to a dictionary and then iterate over the rows, but I was really hoping for a neat, pandas-esque solution.

Update:

I've also been trying to achieve something simular with pd.DataFrame.apply()

In the concrete case, I can iterate over the dataframe as follows:

ser=pd.Series()
for index,row in df.iterrows():
    locals().update(row.to_dict())
    ser[index]=eval(template)
df["new_col"]=ser

This might have to do, but locals().update() feels like a hack and all the advice I have read recommends vectorising over iterating. I'm also having a tough time wrapping this in a function (named or lambda) to pass into pd.DataFrame.apply() because again I don't know the column names ahead of time.

Solution

Here is one way to do it with Python built-in function eval, which must be used with caution:

import pandas as pd

df = pd.DataFrame({"col1": ["abc", "ghi"], "col2": ["def", "jkl"]})

template = 'df[f"{name}"]+"_x"'

name = "col2"
df["new_col"] = eval(template)

Then:

print(df)
# Output

  col1 col2 new_col
0  abc  def   def_x
1  ghi  jkl   jkl_x