Search code examples
pythonmissing-data

Vertical lookup between two columns with pandas (python)


I would like to know if it is possible to complete the following table to obtain the expected result (to the right)

  X              Y          X              Y
–––––––––––––––-–-        –––––––––––––––-–-
  Argentina     AR          Argentina     AR
  Brazil       NaN          Brazil        BR
  Brazil        BR          Brazil        BR
  Canada        CA          Canada        CA
  NaN           AR          Argentina     AR
  Canada       NaN          Canada        CA

My intention is to replace NaNs by considering information from columns X and Y, specifically to get instances with copies country/code. Each country has just one code and viceversa.


Solution

  • Edits

    1. the original answer using .transform() would break for old versions of pandas (e.g. 1.1.3).
    2. there are cases where one wants to fill missing X values based on Y, and cases where it is the opposite.
    3. a simpler approach is to just get a unique mapping X: Y.
    def first_valid(g):
        return g.bfill().iloc[0]
    
    m = df.groupby('X')['Y'].apply(first_valid)
    
    >>> m
    X
    Argentina    AR
    Brazil       BR
    Canada       CA
    Name: Y, dtype: object
    

    If you like, you can fill the missing values in the original df, e.g. if you had a larger df with other columns and want to keep the same shape, just filling missing values:

    dct = df.groupby('X')['Y'].apply(first_valid).to_dict()
    new_df = df.assign(
        X=df['X'].fillna(df['Y'].map({v:k for k, v in dct.items()})),
        Y=df['Y'].fillna(df['X'].map(dct)),
    )