I have two dataframes named df and df2 with the same number of rows. I would like to create a new column in df based on some logical comparison as follows
df['new_col']='nothing'
df.loc[(df2['col2'].isna()) & (df2['col2'].isna()) & (~df['col'].isna()), 'new_col'] = df['col3']
It seems the above code works fine for my purpose. However my expectation was I will get an error due to mismatching of columns on both sides of "=". The right side is putting an entire column (with size of df rows) into a subset of the new_col columns! I am really confused now! Am I missing something here?
When using a Series in an assignment, pandas performs index alignment internally. This ensures that both sides of the =
have the same length and the same order of indices.
df = pd.DataFrame({'A': [1,2,3]})
# different order, missing indices, extra indices
s = pd.Series({2: 30, 0: 10, 4: 40})
df['B'] = s
print(df)
Output:
A B
0 1 10.0
1 2 NaN
2 3 30.0
If you used a list or array, then an exact matching length would be needed.
# runs fine
df['C'] = [1,2,3]
# triggers error
df['C'] = [1,2]
# ValueError: Length of values (2) does not match length of index (3)