Search code examples
pythoninitialization

Irregularity in Python class variable assignment


I'm trying to create a class which has multiple class-level variables, some of which have calculated values which reference previously declared class-level variables. However, I'm having difficulty referencing the variables at certain points.

My first attempt:

#!/usr/bin/env python
from decimal import Decimal
import math

class Foo(object):
    NUM_BUCKETS = 10
    BUCKET_SIZE = Decimal(1.0 / NUM_BUCKETS)
    BUCKET_LABELS = tuple("BUCKET_{}".format(int(BUCKET_SIZE * i * 100)) for i in xrange(1, NUM_BUCKETS + 1))

print Foo.BUCKET_LABELS

Result:

> python test.py
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    class Foo(object):
  File "test.py", line 8, in Foo
    BUCKET_LABELS = tuple("BUCKET_{}".format(int(BUCKET_SIZE * i * 100)) for i in xrange(1, NUM_BUCKETS + 1))
  File "test.py", line 8, in <genexpr>
    BUCKET_LABELS = tuple("BUCKET_{}".format(int(BUCKET_SIZE * i * 100)) for i in xrange(1, NUM_BUCKETS + 1))
NameError: global name 'BUCKET_SIZE' is not defined

Trying to access the class variable via the class name doesn't work either:

#!/usr/bin/env python
from decimal import Decimal
import math

class Foo(object):
    NUM_BUCKETS = 10
    BUCKET_SIZE = Decimal(1.0 / NUM_BUCKETS)
    BUCKET_LABELS = tuple("BUCKET_{}".format(int(Foo.BUCKET_SIZE * i * 100)) for i in xrange(1, NUM_BUCKETS + 1))

print Foo.BUCKET_LABELS

Result:

> python test2.py
Traceback (most recent call last):
  File "test2.py", line 5, in <module>
    class Foo(object):
  File "test2.py", line 8, in Foo
    BUCKET_LABELS = tuple("BUCKET_{}".format(int(Foo.BUCKET_SIZE * i * 100)) for i in xrange(1, NUM_BUCKETS + 1))
  File "test2.py", line 8, in <genexpr>
    BUCKET_LABELS = tuple("BUCKET_{}".format(int(Foo.BUCKET_SIZE * i * 100)) for i in xrange(1, NUM_BUCKETS + 1))
NameError: global name 'Foo' is not defined

Replacing the reference to BUCKET_SIZE with a hardcoded value fixes the problem; even though there's another class-level variable reference in the same line, it works just fine:

#!/usr/bin/env python
from decimal import Decimal
import math

class Foo(object):
    NUM_BUCKETS = 10
    BUCKET_SIZE = Decimal(1.0 / NUM_BUCKETS)
    BUCKET_LABELS = tuple("BUCKET_{}".format(int(Decimal(0.1) * i * 100)) for i in xrange(1, NUM_BUCKETS + 1))

print Foo.BUCKET_LABELS

Result:

> python test3.py
('BUCKET_10', 'BUCKET_20', 'BUCKET_30', 'BUCKET_40', 'BUCKET_50', 'BUCKET_60', 'BUCKET_70', 'BUCKET_80', 'BUCKET_90', 'BUCKET_100')

Does anybody know the correct way of referencing BUCKET_SIZE in that spot? Is this a bug in Python itself? (I'm running Python 2.7.5, BTW.)


Solution

  • First of all, a solution among others, by simply editing this one line (notice the brackets):

    BUCKET_LABELS = tuple(["BUCKET_{}".format(int(BUCKET_SIZE * i * 100)) for i in xrange(1, NUM_BUCKETS + 1)]) 
    

    Now, if you are curious as to why it works like that in Python, and why it is not a bug... Well, it is not an easy one :-) :

    [i*2 for i in xrange(3)] is a list comprehension. It generates an actual list, which can be used like this for example:

    >>> a = [i*2 for i in xrange(3)]
    >>> a
    [0, 2, 4]
    

    (i*2 for i in xrange(3)) is a generator expression. It works in a rather similar way, but not exactly, since it does not generate a list or a tuple, but rather, a generator:

    >>> a = (i*2 for i in xrange(3))
    >>> a
    <generator object <genexpr> at 0x02CEE058>
    >>> a.next()
    0
    >>> a.next()
    2
    >>> a.next()
    4
    >>> a.next()
    Traceback (most recent call last):
      File "<input>", line 1, in <module>
    StopIteration
    >>> a = (i*2 for i in xrange(3))
    >>> tuple(a)
    (0, 2, 4)
    >>> tuple(a)
    ()
    

    You can find more information here if you are curious: generator expressions.

    The tl;dr version is that a generator cannot be accessed directly (you must ask it to generate its content, using next() for example), and that each value can only be generated once (and then the generator moves on to the next one, hence the next() function name).

    So, coming back at your problem. In the expression below, you actually ask to generate a tuple with a generator expression, which is fine in itself. Nonetheless, you do it by using Foo class variables, which in the case of a generator, can be problematic.

    BUCKET_LABELS = tuple("BUCKET_{}".format(int(BUCKET_SIZE * i * 100)) for i in xrange(1, NUM_BUCKETS + 1))
    

    In particular, the generator does not actually know about Foo.BUCKET_SIZE variable at all once you ask him to generate a tuple out of it (a generator works within in its scope, contrary to a list). This is why you get this error.

    So, one solution is simply to rather use a list comprehension (which is more easy to handle/intuitive anyway).

    PS: the Decimal() function probably does not do what you think it does:

    >>> NUM_BUCKETS = 10
    >>> print Decimal(1.0 / NUM_BUCKETS)
    0.1000000000000000055511151231257827021181583404541015625
    >>> print round(1.0 / NUM_BUCKETS, 2)
    0.1
    

    PPS: the reason why you do not get an error with the xrange(1, NUM_BUCKETS + 1) part, if you are curious about it, is because it is evaluated before the generator is built, so, that class variable is actually replaced by its value for the generator... which does not complain about it.