Search code examples
pythonpandasmultiple-columnsmatching

Check if two columns are having matching values, but values are not in the same index places(Python, Pandas)


I have two DataFrames about Super Store Sales:

  • the first is named df_orders
  • the second one is named df_returns

In both DataFrames we have a matching column called "Order ID", but df_returns has less rows than df_orders. What I want to do is make a new column in df_orders with the value 'Returned' if the Order ID is present in df_returns and 'Not returned' if not.

Here are samples of both DataFrames:

df_order= {'City':['Prior Lake','Chicago','NY','Prior Lake', 'Round Rock'],
           'Order ID':[86838 ,90154,15000,10000, 12447]}
df_return= {'Order ID':[90154, 86838 ],
           'Returned':['Returned', 'Returned']}

# Create DataFrame from dict
df_orders = pd.DataFrame.from_dict(df_order)
df_returns = pd.DataFrame.from_dict(df_return)

This is how i thought it should be checked but it is definitely not correct cause everywhere says "not returned", but I've checked manually and seen that some orders are matching. Please, help me out.

excel_path = r'C:\Users\Korisnik\Desktop\PythonFiles\Omega\SuperStoreUS.xlsx'
df = pd.read_excel(excel_path, sheet_name=None)

# 1.
df_order = df.get('Orders')
df_returns = df.get('Returns')
df_users = df.get('Users')

df_n.reset_index(drop=True)
df_returns.reset_index(drop=True)
df_n['Status'] = np.where( df_n['Order ID'].equals(df_returns['Order ID'])  and df_returns["Status"] == "Returned", "Returned", "Not returned")

Solution

  • You can use pandas.DataFrame.merge with pandas.Series.fillna :

    df_order = pd.read_excel("SuperStoreUS.xlsx", sheet_name="Orders")
    df_return = pd.read_excel("SuperStoreUS.xlsx", sheet_name="Returns")
    

    Use either :

    # --- To create a new dataframe
    out = df_order.merge(df_return, on="Order ID", how="left")
    out["Status"] = out["Status"].fillna("Not Returned")
    

    Or:

    # --- To update df_order
    df_order = df_order.merge(df_return, on="Order ID", how="left")
    df_order["Status"] = df_order["Status"].fillna("Not Returned")