I would like to know if it is possible to complete the following table to obtain the expected result (to the right)
X Y X Y
–––––––––––––––-–- –––––––––––––––-–-
Argentina AR Argentina AR
Brazil NaN Brazil BR
Brazil BR Brazil BR
Canada CA Canada CA
NaN AR Argentina AR
Canada NaN Canada CA
My intention is to replace NaNs by considering information from columns X and Y, specifically to get instances with copies country/code. Each country has just one code and viceversa.
Edits
.transform()
would break for old
versions of pandas
(e.g. 1.1.3
).X
values based on Y
, and cases where it is the opposite.X: Y
.def first_valid(g):
return g.bfill().iloc[0]
m = df.groupby('X')['Y'].apply(first_valid)
>>> m
X
Argentina AR
Brazil BR
Canada CA
Name: Y, dtype: object
If you like, you can fill the missing values in the original df
, e.g. if you had a larger df
with other columns and want to keep the same shape, just filling missing values:
dct = df.groupby('X')['Y'].apply(first_valid).to_dict()
new_df = df.assign(
X=df['X'].fillna(df['Y'].map({v:k for k, v in dct.items()})),
Y=df['Y'].fillna(df['X'].map(dct)),
)