I am comparing two dataframes by looping through both and determining if data (row) from one does not exist in the other. If there is no match, I want to add that specific row to a third dataframe. I was able to achieve this with the append() method, but Python warns me that it is deprecated and tells me to use concat instead. Unfortunately, I cannot get concat to achieve the same result.
import pandas as pd
df1 = pd.DataFrame(
{
"Animal": ["Corgi",
"Labrador",
"Orange Cat",
"Black Cat"],
"Owner": ["Bob",
"Joe",
"Sara",
"Glen"]
})
df2 = pd.DataFrame(
{
"Animal": ["Corgi",
"Labrador",
"Black Cat"],
"Owner": ["Bob",
"Joe",
"Glen"]
})
dfF = pd.DataFrame(columns=["Animal", "Owner"])
df1Sort = df1.sort_values(by=["Animal"]).reset_index(drop=True)
df2Sort = df2.sort_values(by=["Animal"]).reset_index(drop=True)
for ind in df1Sort.index:
for subInd in df2Sort.index:
if df1Sort["Animal"][ind] == df2Sort["Animal"][subInd]:
print("Found match:",
df1Sort["Animal"][ind],
"-",
df1Sort["Owner"][ind])
break
elif df2Sort.index[subInd] == df2Sort.index[-1]:
print("No match for:",
df1Sort["Animal"][ind],
"-",
df1Sort["Owner"][ind])
dfF = pd.concat([dfF, df1Sort.iloc[ind]], axis=0, ignore_index=True)
break
print(dfF)
How do I alter the concat line to achieve this?
This issue happens because df1Sort.iloc[ind]
returns a Series, not a Dataframe, and pd.concat
expects a Dataframe. You need to wrap the Series in a DataFrame to ensure proper concatenation.
dfF = pd.concat([dfF, pd.DataFrame([df1Sort.iloc[ind]])], ignore_index=True)
Here's the updated code
import pandas as pd
df1 = pd.DataFrame({
"Animal": ["Corgi", "Labrador", "Orange Cat", "Black Cat"],
"Owner": ["Bob", "Joe", "Sara", "Glen"]
})
df2 = pd.DataFrame({
"Animal": ["Corgi", "Labrador", "Black Cat"],
"Owner": ["Bob", "Joe", "Glen"]
})
dfF = pd.DataFrame(columns=["Animal", "Owner"])
df1Sort = df1.sort_values(by=["Animal"]).reset_index(drop=True)
df2Sort = df2.sort_values(by=["Animal"]).reset_index(drop=True)
for ind in df1Sort.index:
for subInd in df2Sort.index:
if df1Sort["Animal"][ind] == df2Sort["Animal"][subInd]:
print("Found match:", df1Sort["Animal"][ind], "-", df1Sort["Owner"][ind])
break
elif df2Sort.index[subInd] == df2Sort.index[-1]:
print("No match for:", df1Sort["Animal"][ind], "-", df1Sort["Owner"][ind])
dfF = pd.concat([dfF, pd.DataFrame([df1Sort.iloc[ind]])], ignore_index=True)
break
print(dfF)
I hope this will help you a little.