Search code examples
pythonoutputdifferencedifflib

Python - compare two string by words using difflib and print only difference


Python newbie here. I have the following code to compare two strings using difflab library. The output is prefixed with '+','-' for words which are different. How to get only the differences printed without any prefix?

The expected output for the below code is

Not in first string: Nvdia

Not in first string: IBM

Not in second string: Microsoft

Not in second string: Google

Not in second string: Oracle

or just Nvdia, IBM, Microsoft, Google, Oracle

import difflib

original = "Apple Microsoft Google Oracle"
edited = "Apple Nvdia IBM"

# initiate the Differ object
d = difflib.Differ()

# calculate the difference between the two texts
diff = d.compare(original.split(), edited.split())

# output the result
print ('\n'.join(diff))

Thanks!


Solution

  • If you don't have to use difflib, you could use a set and string splitting!

    >>> original = "Apple Microsoft Google Oracle"
    >>> edited = "Apple Nvdia IBM"
    >>> set(original.split()).symmetric_difference(set(edited.split()))
    {'IBM', 'Google', 'Oracle', 'Microsoft', 'Nvdia'}
    

    You can also get the shared members with the .intersection()

    >>> set(original.split()).intersection(set(edited.split()))
    {'Apple'}
    

    The Wikipedia has a good section on basic set operations with accompanying Venn diagrams
    https://en.wikipedia.org/wiki/Set_(mathematics)#Basic_operations


    However, if you have to use difflib (some strange environment or assignment) you can also just find every member with a +- prefix and slice off the all the prefixes

    >>> diff = d.compare(original.split(), edited.split())
    >>> list(a[2:] for a in diff if a.startswith(("+", "-")))
    ['Nvdia', 'IBM', 'Microsoft', 'Google', 'Oracle']
    

    All of these operations result in an iterable of strings, so you can .join() 'em together or similar to get a single result as you do in your Question

    >>> print("\n".join(result))
    IBM
    Google
    Oracle
    Microsoft
    Nvdia