Search code examples
pythonpandasdataframevectorization

Using a list of column names to do operations in pandas dataframe


Say I have a dataframe:

import pandas as pd
df = pd.DataFrame({"A1": [10, 20, 15, 30, 45],
                    "B1": [13, 23, 18, 33, 48],
                    "C1": [17, 27, 22, 37, 52],
                    "A2": [10, 20, 15, 30, 45],
                    "B2": [13, 23, 18, 33, 48],
                    "C2": [17, 27, 22, 37, 52]})
col1_names = ['A1', 'B1', 'C1']
col2_names = ['A2', 'B2', 'C2']
col_new = ['delA', 'delB', 'delC']

I would like to do an operation where I get three new columns in df with values corresponding to the difference between col2_names and col1_names.

for i in range(len(col1_names)):
    df[col_new[i]] = df[col2_names[i]] - df[col1_names[i]]

Is there a way to vectorize and do it without a loop?

Tried this:

df[col_new] = df[col2_names] - df[col1_names]

Expected same result as the above loop solution but I get ValueError: Columns must be same length as key.

Additional: can this be generalized to other operations?


Solution

  • When you are performing operations on data frames, they will align their indices. Row labels are matched to row labels and same with column labels. col1_names and col2_names have nothing in common so you get all NA.

    Try this:

    df[col_new] = df[col2_names].values - df[col1_names].values