Search code examples
pythonpandasdataframedata-manipulation

A question about using the loc method to create a new column based on existing columns


I have two dataframes named df and df2 with the same number of rows. I would like to create a new column in df based on some logical comparison as follows

df['new_col']='nothing'

df.loc[(df2['col2'].isna()) & (df2['col2'].isna()) & (~df['col'].isna()), 'new_col'] = df['col3']

It seems the above code works fine for my purpose. However my expectation was I will get an error due to mismatching of columns on both sides of "=". The right side is putting an entire column (with size of df rows) into a subset of the new_col columns! I am really confused now! Am I missing something here?


Solution

  • When using a Series in an assignment, pandas performs index alignment internally. This ensures that both sides of the = have the same length and the same order of indices.

    df = pd.DataFrame({'A': [1,2,3]})
    # different order, missing indices, extra indices
    s = pd.Series({2: 30, 0: 10, 4: 40})
    df['B'] = s
    print(df)
    

    Output:

       A     B
    0  1  10.0
    1  2   NaN
    2  3  30.0
    

    If you used a list or array, then an exact matching length would be needed.

    # runs fine
    df['C'] = [1,2,3]
    
    # triggers error
    df['C'] = [1,2]
    # ValueError: Length of values (2) does not match length of index (3)