I am comparing the content of two excel sheets that I have turned into a pandas data frame and placed the columns next to each other.
I have created some code that compares the two columns and gives the output below, but some of the strings contain large amounts of text so I would like to show only the difference.
+---------------------+-------------------------+---------------------------------------------+
| Old | New | Changes |
+---------------------+-------------------------+---------------------------------------------+
| Apple, Egg, Ham | Apple, Egg, Norway, Ham | Apple, Egg, Ham --> Apple, Egg, Norway, Ham |
| Instagram, Facebook | Instagram, Twitter | Instagram, Facebook --> Instagram, Twitter |
+---------------------+-------------------------+---------------------------------------------+
The optimal result would look something like this:
+---------------------+-------------------------+---------------------+
| Old | New | Changes |
+---------------------+-------------------------+---------------------+
| Apple, Egg, Ham | Apple, Egg, Norway, Ham | +Norway |
| Instagram, Facebook | Instagram, Twitter | +Twitter, -Facebook |
+---------------------+-------------------------+---------------------+
Norway has been added in row 1, Twitter has been added to row 2, and Facebook has been removed from row 2.
How can I solve this?
Convert values to sets and then use their difference, also add +
and -
in f-string
s and join by ,
in last step:
def f(x):
old, new = set(x['Old'].split(', ')), set(x['New'].split(', '))
d = old.difference(new)
e = new.difference(old)
return ', '.join([f'+{y}' for y in e] + [f'-{y}' for y in d])
df['Changes'] = df.apply(f, axis=1)
print (df)
Old New Changes
0 Apple, Egg, Ham Apple, Egg, Norway, Ham +Norway
1 Instagram, Facebook Instagram, Twitter +Twitter, -Facebook