Suppose I have the following columns for a dataframe (1, 2, 3, 4):
The desired output should be:
The logic is as follows: for each value in the first column, check if the value exists in the other columns. If it does, assign the value to the corresponding row in the first column. If it doesn't, assign Nan to that row. If a new value appears in another column, assign Nan to the corresponding row in the first column.
I'm trying to do by this:
output_df = pd.DataFrame(columns=df.columns)
for col in df.columns:
values = []
for i, value in enumerate(df[col]):
if value in df['Column1'].values:
values.append(value)
else:
values.append(np.nan)
output_df[col] = values
output_df['Column1'] = df['Column1']
Let me know what other changes I can do to get the desired output
You could try to outer-merge Column1
with each column and concatenate the the last columns of the results.
df_result = pd.concat(
(df[["Column1"]].merge(df[[col]], left_on="Column1", right_on=col, how="outer").iloc[:, [-1]]
for col in df.columns),
axis=1
)
Result for the sample:
Column1 Column2
0 Curl Care Anti-Hairfall 200ml NaN \
1 Curl Care Clean 200ml Curl Care Clean 200ml
2 Curl Care Color Protect 200ml NaN
3 NaN Curl Care Rose 200ml
4 NaN Silidone Damage Control 200ml
Column3 Column4
0 Curl Care Anti-Hairfall 200ml Curl Care Anti-Hairfall 200ml
1 NaN Curl Care Clean 200ml
2 Curl Care Color Protect 200ml Curl Care Color Protect 200ml
3 Curl Care Rose 200ml NaN
4 NaN NaN