Search code examples
pythonclassdictionaryscoping

Python class variable assignment using dictionary comprehension


During a class definition, a class variable defined as a dictionary is used in the construction of a second dictionary class variable, a subset pared down from the first, like this:

class C(object):
    ALL_ITEMS = dict(a='A', b='B', c='C', d='D', e='E')
    SUBSET_X = {k: v for k, v in ALL_ITEMS.items() if k in ('a', 'b', 'd')}  # (this works)
    SUBSET_Y = {k: ALL_ITEMS[k] for k in ('a', 'b', 'd')}  # (this fails)

Pretty simple stuff, but the net effect of executing this code is quite surprising to me. My first approach was the code on line 4, but I had to resort to the solution on line 3 instead. There is something subtle about dictionary comprehension scoping rules that I'm clearly failing to grasp.

Specifically, the error raised in the failing case is:

File "goofy.py", line 4, in <dictcomp>
   SUBSET_Y = {k: ALL_ITEMS.get(k) for k in ('a', 'b', 'd')}
NameError: name 'ALL_ITEMS' is not defined

The nature of this error is baffling to me for a few different reasons:

  1. The assignment to SUBSET_Y is a well-formed dictionary comprehension, and references a symbol which should be in-scope and accessible.
  2. In the succeeding case (the assignment to SUBSET_X), which is also a dictionary comprehension, the symbol ALL_ITEMS is perfectly well-defined and accessible. Thus, the fact that the raised exception is a NameError in the failing case seems manifestly wrong. (Or misleading, at best.)
  3. Why would the scoping rules differ for items() vs. __getitem__ or get()? (The same exception occurs replacing ALL_ITEMS[k] with ALL_ITEMS.get(k) in the failure case.)

(Even as a Python developer for over a decade, I've never run into this failure before, which either means I've been lucky or have lived a sheltered existence :^)

The same failure occurs in various 3.6.x CPython versions as well as 2.7.x versions.

EDIT: No, this is not a duplicate of a previous question. That pertained to list comprehensions, and even if one were to project the same explanation to dictionary comprehensions, it doesn't explain the difference between the two cases I cited. And also, it is not a Python 3-only phenomenon.


Solution

  • There is one minor detail that explains why the first-version works but the second version fails. The reason the second version fails is the same reason that is given in this question, namely, all comprehension constructs (in Python 3, in Python 2, list-comprehensions were implemented differently) create a function scope where all of the local name-bindings occur. However, names in a class scope are not accessible to functions defined inside the class scope. This is why you have to use either self.MY_CLASS_VAR or MyClass.MY_CLASS_VAR to access a class variable from a method.

    The reason your first case does happen to work is subtle. According to the language reference

    The comprehension consists of a single expression followed by at least one for clause and zero or more for or if clauses. In this case, the elements of the new container are those that would be produced by considering each of the for or if clauses a block, nesting from left to right, and evaluating the expression to produce an element each time the innermost block is reached.

    However, aside from the iterable expression in the leftmost for clause, the comprehension is executed in a separate implicitly nested scope. This ensures that names assigned to in the target list don’t “leak” into the enclosing scope.

    The iterable expression in the leftmost for clause is evaluated directly in the enclosing scope and then passed as an argument to the implictly nested scope.

    So, in the first case, ALL_ITEMS.items() is in the left-most for-clause, so it is evaluated directly in the enclosing scope, in this case, the class scope, so it happily finds the ALL_ITEMS name.