I have two DataFrames about Super Store Sales:
df_orders
df_returns
In both DataFrames we have a matching column called "Order ID", but df_returns
has less rows than df_orders
. What I want to do is make a new column in df_orders
with the value 'Returned' if the Order ID is present in df_returns and 'Not returned' if not.
Here are samples of both DataFrames:
df_order= {'City':['Prior Lake','Chicago','NY','Prior Lake', 'Round Rock'],
'Order ID':[86838 ,90154,15000,10000, 12447]}
df_return= {'Order ID':[90154, 86838 ],
'Returned':['Returned', 'Returned']}
# Create DataFrame from dict
df_orders = pd.DataFrame.from_dict(df_order)
df_returns = pd.DataFrame.from_dict(df_return)
This is how i thought it should be checked but it is definitely not correct cause everywhere says "not returned", but I've checked manually and seen that some orders are matching. Please, help me out.
excel_path = r'C:\Users\Korisnik\Desktop\PythonFiles\Omega\SuperStoreUS.xlsx'
df = pd.read_excel(excel_path, sheet_name=None)
# 1.
df_order = df.get('Orders')
df_returns = df.get('Returns')
df_users = df.get('Users')
df_n.reset_index(drop=True)
df_returns.reset_index(drop=True)
df_n['Status'] = np.where( df_n['Order ID'].equals(df_returns['Order ID']) and df_returns["Status"] == "Returned", "Returned", "Not returned")
You can use pandas.DataFrame.merge
with pandas.Series.fillna
:
df_order = pd.read_excel("SuperStoreUS.xlsx", sheet_name="Orders")
df_return = pd.read_excel("SuperStoreUS.xlsx", sheet_name="Returns")
Use either :
# --- To create a new dataframe
out = df_order.merge(df_return, on="Order ID", how="left")
out["Status"] = out["Status"].fillna("Not Returned")
Or:
# --- To update df_order
df_order = df_order.merge(df_return, on="Order ID", how="left")
df_order["Status"] = df_order["Status"].fillna("Not Returned")