Search code examples
pythonbigintegerrepresentation

__repr__ for (large) composite objects


I would like to have informative representations for my composite objects (i.e., objects composed of other (potentially composite) objects). However, because my code fundamentally deals with high-precision numbers (please don't ask me why I don't just use doubles), I end up with representations like you see here: http://pastebin.com/jpLgAfxC. Would it just be better to just stick with the default __repr__?


Solution

  • Whether to have a verbose repr depends on what you want to accomplish. For complex or composite objects, I know which I'd prefer of the following:

    Point(x=1.12, y=2.2, z=-1.9)
    <__main__.Point object at 0x103011890>
    

    They both tell me what type the object is, but only the first is clear about all of the (relevant) values involved, and avoids low-level information that is only relevant on the rarest of occasions.

    I like to see the real values. But, yours is a special case, given that your values are so frightfully humongous:

    72401317106217603290426741268390656010621951704689382948334809645
    87850348552960901165648762842931879347325584704068956434195098288
    38279057775096090002410493665682226331178331461681861612403032369
    73237863637784679012984303024949059416189689048527978878840119376
    5152408961823197987224502419157858495179687559851
    

    That they cannot be useful for most development or debugging purposes. I'm sure there are times you need the full serialization--to send to and from files, for example. But those have to be fairly rare, no? I can't imagine you really remember all 309 digits, or can determine if the above number is the same as the one below on visual inspection:

    72401317106217603290426741268390656010621951704689382948334809645
    87850348552960901165648762842931879347325584704068956434195098288
    38279057775096090002410493665682226331178331461681861612403032369
    73327863637784679012984303024949059416189689048527978878840119376
    5152408961823197987224502419157858495179687559851
    

    They're not the same. But unless you're Spock or The Terminator, you wouldn't know that from a quick glance. (And actually, I've made it easier here, length-wrapping to avoid having to horizontally scroll.)

    So I would recommend (massively) shortening their representation, to make the output more tractable. This is like printing out the entire chapter text every time you want to print a Chapter object. Overkill.

    Instead, try something much shorter and easier to work with. Truncation and/or ellipsis are useful. e.g.

    72401...59851
    7240131710... 
    

    You can use the object id as well. If your high-precision type is HP, then:

    HP(0x103011890)
    

    At least then you will be able to tell them apart. One ugliness of using object ids, however, is that objects can be logically equivalent, but if you create multiple objects with the same logical value, they'd have different ids, thus appear different when they are not. You can get around that by creating your own short hash function. There's a bit of an art to hashing, but for reprs, even something simple would work. E.g.:

    import binascii, struct
    
    def shorthash(s):
        """
        Given a Python value, produce a short alphanumeric hash that
        helps identify it for debugging purposes. A riff on 
        http://stackoverflow.com/a/2511059/240490
        Enhanced to remove trailing boilerplate, and to work
        on either Python 2 or Python 3.
        """
        hashbytes = binascii.b2a_base64(struct.pack('l', hash(s)))
        return hashbytes.decode('utf-8').rstrip().rstrip("=")
    

    Then define your repr in the high-precision class:

    def __repr__(self):
        clsname = self.__class__.__name__
        return '{0}({1}).format(clsname, shorthash(self.value))
    

    Where self.value is whatever local attribute, property, or method creates the multi-hundred-digit value. If you're subclassing int, this could be just self.

    This gets you to:

    HP(Tea+5MY0WwA)
    

    The two massive, almost identical numbers above? Using this scheme, they render out to:

    HP(XhkG0358Fx4)
    HP(27CdIG5elhQ)
    

    Which are obviously different. You can combine this with a bit of a value representation. E.g. a few alternatives:

    HP(~7.24013e308 @ XhkG0358Fx4)
    HP(dig='72401...59851', ndigits=309, hash='XhkG0358Fx4')
    

    You'll find these shorter values more useful in debugging contexts. You can, of course, keep around a method or property (e.g. .value, .digits, or .alldigits) for those case in which you need every last bit, but define the common case as something more easily consumed.