Search code examples
pythondata-structurescollectionsabstract-data-typefunctools

How to use python collections for custom classes


Still somewhat perplexed by python and it's magic functional programming, so I tend to find myself writing code that is more towards the Java paradigm of programming as opposed to Idiomatic Python.

My question is somewhat related to: How do I make a custom class a collection in Python

The only difference is I have nested objects (using composition). The VirtualPage object is comprised of a list of PhysicalPage objects. I have a function which can take a list of PhyscialPage objects and coalesce all of the details into a single named tuple I call PageBoundary. Essentially it's a serialization function which can spit out a tuple comprised of an integer range which represents the physical page and the line number in the page. From this I can easily sort and order VirtualPages among one another (that's the idea at least):

PageBoundary = collections.namedtuple('PageBoundary', 'begin end')

I also have a function which can take a PageBoundary namedtuple and de-serialize or expand the tuple into a list of PhysicalPages. It's preferable that these two data storage classes not change as it will break any downstream code.

Here is a snippet of my custom python2.7 class. It is composed of lot things one is list which contains a the object PhysicalPage:

class VirtualPage(object):
    def __init__(self, _physical_pages=list()):
        self.physcial_pages = _physcial_pages


class PhysicalPage(object):
    # class variables: number of digits each attribute gets
    _PAGE_PAD, _LINE_PAD = 10, 12 

    def __init__(self, _page_num=-1):
        self.page_num = _page_num
        self.begin_line_num = -1
        self.end_line_num = -1

    def get_cannonical_begin(self):
        return int(''.join([str(self.page_num).zfill(PhysicalPage._PAGE_PAD),
                    str(tmp_line_num).zfill(PhysicalPage._LINE_PAD) ]))

    def get_cannonical_end(self):
        pass # see get_cannonical_begin() implementation

    def get_canonical_page_boundaries(self):
        return PageBoundary(self.get_canonical_begin(), self.get_canonical_end())

I would like to leverage some templated collection (from the python collections module) to easily sort and compare as list or set of VirtualPage classes. Also would like some advice on the layout of my data storage classes: VirtualPage and PhysicalPage.

Given either a sequence of VirtualPages or as in the example below:

vp_1 = VirtualPage(list_of_physical_pages)
vp_1_copy = VirtualPage(list_of_physical_pages)
vp_2 = VirtualPage(list_of_other_physical_pages)

I want to easily answer questions like this:

>>> vp_2 in vp_1 
False
>>> vp_2 < vp_1
True
>>> vp_1 == vp_1_copy
True

Right off the bat it seems obvious that the VirtualPage class needs to call get_cannonical_page_boundaries or even implement the function itself. At a minimum it should loop over it's PhysicalPage list to implement the required functions (lt() and eq()) so I can compare b/w VirtualPages.

1.) Currently I'm struggling with implementing some of the comparison functions. One big obstacle is how to compare a tuple? Do I create my own lt() function by creating a custom class which extends some type of collection:

import collections as col
import functools

@total_ordering
class AbstractVirtualPageContainer(col.MutableSet):

    def __lt__(self, other):
        '''What type would other be?
        Make comparison by first normalizing to a comparable type: PageBoundary
        '''
        pass

2.) Should the comparison function implementation exist in the VirtualPage class instead?

I was leaning towards some type of Set data structure as the properties of the data I'm modeling has the concept of uniqueness: i.e. physical page values cannot overlap and to some extend act as a linked list. Also would setter or getter functions, implemented via @ decorator functions be of any use here?


Solution

  • I think you want something like the code below. Not tested; certainly not tested for your application or with your data, YMMV, etc.

    from collections import namedtuple
    
    # PageBoundary is a subclass of named tuple with special relational
    # operators. __le__ and __ge__ are left undefined because they don't
    # make sense for this class.
    class PageBoundary(namedtuple('PageBoundary', 'begin end')):
        # to prevent making an instance dict (See namedtuple docs)
        __slots__ = ()
    
        def __lt__(self, other):
            return self.end < other.begin
    
        def __eq__(self, other):
            # you can put in an assertion if you are concerned the
            # method might be called with the wrong type object
            assert isinstance(other, PageBoundary), "Wrong type for other"
    
            return self.begin == other.begin and self.end == other.end
    
        def __ne__(self, other):
            return not self == other
    
        def __gt__(self, other):
            return other < self
    
    
    class PhysicalPage(object):
        # class variables: number of digits each attribute gets
        _PAGE_PAD, _LINE_PAD = 10, 12 
    
        def __init__(self, page_num):
            self.page_num = page_num
    
            # single leading underscore is 'private' by convention
            # not enforced by the language
            self._begin = self.page_num * 10**PhysicalPage._LINE_PAD + tmp_line_num
            #self._end = ...however you calculate this...                    ^ not defined yet
    
            self.begin_line_num = -1
            self.end_line_num = -1
    
        # this serves the purpose of a `getter`, but looks just like
        # a normal class member access. used like x = page.begin  
        @property
        def begin(self):
            return self._begin
    
        @property
        def end(self):
            return self._end
    
        def __lt__(self, other):
            assert(isinstance(other, PhysicalPage))
            return self._end < other._begin
    
        def __eq__(self, other):
            assert(isinstance(other, PhysicalPage))
            return self._begin, self._end == other._begin, other._end
    
        def __ne__(self, other):
            return not self == other
    
        def __gt__(self, other):
            return other < self
    
    
    class VirtualPage(object):
        def __init__(self, physical_pages=None):
            self.physcial_pages = sorted(physcial_pages) if physical_pages else []
    
        def __lt__(self, other):
            if self.physical_pages and other.physical_pages:
                return self.physical_pages[-1].end < other.physical_pages[0].begin
    
            else:
                raise ValueError
    
        def __eq__(self, other):
            if self.physical_pages and other.physical_pages:
                return self.physical_pages == other.physical_pages
    
            else:
                raise ValueError
    
        def __gt__(self, other):
            return other < self
    

    And a few observations:

    Although there is no such thing as "private" members in Python classes, it is a convention to begin a variable name with a single underscore, _, to indicate it is not part of the public interface of the class / module/ etc. So, naming method parameters of public methods with an '_', doesn't seem correct, e.g., def __init__(self, _page_num=-1).

    Python generally doesn't use setters / getters; just use the attributes directly. If attribute values need to be calculated, or other some other processing is needed use the @property decorator (as shown for PhysicalPage.begin() above).

    It's generally not a good idea to initialize a default function argument with a mutable object. def __init__(self, physical_pages=list()) does not initialize physical_pages with a new empty list each time; rather, it uses the same list every time. If the list is modified, at the next function call physical_pages will be initialized with the modified list. See VirtualPages initializer for an alternative.