Search code examples
pythonpandasdataframenumpyreshape

to check the values of different and columns and if matching then assigned to the rows


Suppose I have the following columns for a dataframe (1, 2, 3, 4):

  • Column1 value: ('Curl Care Anti-Hairfall 200ml', 'Curl Care Clean 200ml', 'Curl Care Color Protect 200ml')
  • Column2 value: ('Curl Care Rose 200ml', 'Silidone Damage Control 200ml', 'Curl Care Clean 200ml')
  • Column3 value: ('Curl Care Rose 200ml', 'Curl Care Color Protect 200ml', 'Curl Care Anti-Hairfall 200ml')
  • Column4 value: ('Curl Care Color Protect 200ml', 'Curl Care Anti-Hairfall 200ml', 'Curl Care Clean 200ml')

The desired output should be:

  • Column1 value: ('Curl Care Anti-Hairfall 200ml', 'Curl Care Clean 200ml', 'Curl Care Color Protect 200ml', NAN, NAN,)
  • Column2 value: (NAN, 'Curl Care Clean 200ml', NAN, 'Curl Care Rose 200ml', 'Silidone Damage Control 200ml',)
  • Column3 value: ('Curl Care Anti-Hairfall 200ml', NAN, 'Curl Care Color Protect 200ml', 'Curl Care Rose 200ml', NAN)
  • Column4 value: ('Curl Care Anti-Hairfall 200ml', 'Curl Care Clean 200ml', 'Curl Care Color Protect 200ml', NAN, NAN)

The logic is as follows: for each value in the first column, check if the value exists in the other columns. If it does, assign the value to the corresponding row in the first column. If it doesn't, assign Nan to that row. If a new value appears in another column, assign Nan to the corresponding row in the first column.

I'm trying to do by this:

output_df = pd.DataFrame(columns=df.columns)

for col in df.columns:
    values = []
    for i, value in enumerate(df[col]):

        if value in df['Column1'].values:

            values.append(value)

        else:

            values.append(np.nan)

    output_df[col] = values

output_df['Column1'] = df['Column1']

Let me know what other changes I can do to get the desired output


Solution

  • You could try to outer-merge Column1 with each column and concatenate the the last columns of the results.

    df_result = pd.concat(
        (df[["Column1"]].merge(df[[col]], left_on="Column1", right_on=col, how="outer").iloc[:, [-1]]
         for col in df.columns),
        axis=1
    )
    

    Result for the sample:

                             Column1                        Column2   
    0  Curl Care Anti-Hairfall 200ml                            NaN  \
    1          Curl Care Clean 200ml          Curl Care Clean 200ml   
    2  Curl Care Color Protect 200ml                            NaN   
    3                            NaN           Curl Care Rose 200ml   
    4                            NaN  Silidone Damage Control 200ml   
    
                             Column3                        Column4  
    0  Curl Care Anti-Hairfall 200ml  Curl Care Anti-Hairfall 200ml  
    1                            NaN          Curl Care Clean 200ml  
    2  Curl Care Color Protect 200ml  Curl Care Color Protect 200ml  
    3           Curl Care Rose 200ml                            NaN  
    4                            NaN                            NaN