Say I have a dataframe:
import pandas as pd
df = pd.DataFrame({"A1": [10, 20, 15, 30, 45],
"B1": [13, 23, 18, 33, 48],
"C1": [17, 27, 22, 37, 52],
"A2": [10, 20, 15, 30, 45],
"B2": [13, 23, 18, 33, 48],
"C2": [17, 27, 22, 37, 52]})
col1_names = ['A1', 'B1', 'C1']
col2_names = ['A2', 'B2', 'C2']
col_new = ['delA', 'delB', 'delC']
I would like to do an operation where I get three new columns in df with values corresponding to the difference between col2_names and col1_names.
for i in range(len(col1_names)):
df[col_new[i]] = df[col2_names[i]] - df[col1_names[i]]
Is there a way to vectorize and do it without a loop?
Tried this:
df[col_new] = df[col2_names] - df[col1_names]
Expected same result as the above loop solution but I get ValueError: Columns must be same length as key
.
Additional: can this be generalized to other operations?
When you are performing operations on data frames, they will align their indices. Row labels are matched to row labels and same with column labels. col1_names
and col2_names
have nothing in common so you get all NA.
Try this:
df[col_new] = df[col2_names].values - df[col1_names].values