Search code examples
pythongisarcpydifflibsequencematcher

difflib.SequenceMatcher not returning unique ratio


I am trying to compare 2 street networks and when i run this code it returns a a ratio of .253529... i need it to compare each row to get a unique value so i can query out the streets that dont match. What can i do it get it to return unique ratio values per row?

# Set local variables
inFeatures = gp.GetParameterAsText(0)
fieldName = gp.GetParameterAsText(1)
fieldName1 = gp.GetParameterAsText(2)
fieldName2 = gp.GetParameterAsText(3)
expression = difflib.SequenceMatcher(None,fieldName1,fieldName2).ratio()

# Execute CalculateField arcpy.CalculateField_management(inFeatures, fieldName, expression, "PYTHON_9.3")


Solution

  • If you know both files always have the exact same number of lines, a simple approach like this would work:

    ratios = []
    
    with open('fieldName1', 'r') as f1, open('fieldName2', 'r') as f2:
        for l1, l2 in zip(f1, f2):
            R = difflib.SequenceMatcher(None,l1,l2).ratio()
            ratios.append((l1, l2, R))
    

    This will produce a list of tuples like this:

    [("aa", "aa", 1), ("aa", "ab", 0.5), ...]
    

    If your files are different sizes you'll need to find some way to match up the lines, or otherwise handle it