Search code examples
pythonpython-2.7python-3.xdictionarycomparison

Use Python 2 Dict Comparison in Python 3


I'm trying to port some code from Python 2 to Python 3. It's ugly stuff but I'm trying to get the Python 3 results to be as identical to the Python 2 results as possible. I have code similar to this:

import json

# Read a list of json dictionaries by line from file.

objs = []
with open('data.txt') as fptr:
    for line in fptr:
        objs.append(json.loads(line))

# Give the dictionaries a reliable order.

objs = sorted(objs)

# Do something externally visible with each dictionary:

for obj in objs:
    do_stuff(obj)

When I port this code from Python 2 to Python 3, I get an error:

TypeError: unorderable types: dict() < dict()

So I changed the sorted line to this:

objs = sorted(objs, key=id)

But the ordering of the dictionaries still changed between Python 2 and Python 3.

Is there a way to replicate the Python 2 comparison logic in Python 3? Is it simply that id was used before and is not reliable between Python versions?


Solution

  • If you want the same behavior as earlier versions of Python 2.x in both 2.7 (which uses an arbitrary sort order instead) and 3.x (which refuses to sort dicts), Ned Batchelder's answer to a question about how sorting dicts works gets you part of the way there, but not all the way.


    First, it gives you an old-style cmp function, not a new-style key function. Fortunately, both 2.7 and 3.x have functools.cmp_to_key to solve that. (You could of course instead rewrite the code as a key function, but that may make it harder to see any differences between the posted code and your code…)


    More importantly, it not only doesn't do the same thing in 2.7 and 3.x, it doesn't even work in 2.7 and 3.x. To understand why, look at the code:

    def smallest_diff_key(A, B):
        """return the smallest key adiff in A such that A[adiff] != B[bdiff]"""
        diff_keys = [k for k in A if A.get(k) != B.get(k)]
        return min(diff_keys)
    
    def dict_cmp(A, B):
        if len(A) != len(B):
            return cmp(len(A), len(B))
        adiff = smallest_diff_key(A, B)
        bdiff = smallest_diff_key(B, A)
        if adiff != bdiff:
            return cmp(adiff, bdiff)
        return cmp(A[adiff], b[bdiff])
    

    Notice that it's calling cmp on the mismatched values.

    If the dicts can contain other dicts, that's relying on the fact that cmp(d1, d2) is going to end up calling this function… which is obviously not true in newer Python.

    On top of that, in 3.x cmp doesn't even exist anymore.

    Also, this relies on the fact that any value can be compared with any other value—you might get back arbitrary results, but you won't get an exception. That was true (except in a few rare cases) in 2.x, but it's not true in 3.x. That may not be a problem for you if you don't want to compare dicts with non-comparable values (e.g., if it's OK for {1: 2} < {1: 'b'} to raise an exception), but otherwise, it is.

    And of course if you don't want arbitrary results for dict comparison, do you really want arbitrary results for value comparisons?

    The solution to all three problems is simple: you have to replace cmp, instead of calling it. So, something like this:

    def mycmp(A, B):
        if isinstance(A, dict) and isinstance(B, dict):
            return dict_cmp(A, B)
        try:
            return A < B
        except TypeError:
            # what goes here depends on how far you want to go for consistency
    

    If you want the exact rules for comparison of objects of different types that 2.7 used, they're documented, so you can implement them. But if you don't need that much detail, you can write something simpler here (or maybe even just not trap the TypeError, if the exception mentioned above is acceptable).