Search code examples
pythonfactoryfactory-pattern

Improper use of __new__ to generate class instances?


I'm creating some classes for dealing with filenames in various types of file shares (nfs, afp, s3, local disk) etc. I get as user input a string that identifies the data source (i.e. "nfs://192.168.1.3" or "s3://mybucket/data") etc.

I'm subclassing the specific filesystems from a base class that has common code. Where I'm confused is in the object creation. What I have is the following:

import os

class FileSystem(object):
    class NoAccess(Exception):
        pass

    def __new__(cls,path):
        if cls is FileSystem:
            if path.upper().startswith('NFS://'): 
                return super(FileSystem,cls).__new__(Nfs)
            else: 
                return super(FileSystem,cls).__new__(LocalDrive)
        else:
            return super(FileSystem,cls).__new__(cls,path)

    def count_files(self):
        raise NotImplementedError

class Nfs(FileSystem):
    def __init__ (self,path):
        pass

    def count_files(self):
        pass

class LocalDrive(FileSystem):
    def __init__(self,path):
        if not os.access(path, os.R_OK):
            raise FileSystem.NoAccess('Cannot read directory')
        self.path = path

    def count_files(self):
        return len([x for x in os.listdir(self.path) if os.path.isfile(os.path.join(self.path, x))])

data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('/var/log')

print type(data1)
print type(data2)

print data2.count_files()

I thought this would be a good use of __new__ but most posts I read about it's use discourage it. Is there a more accepted way to approach this problem?


Solution

  • I don't think using __new__() to do what you want is improper. In other words, I disagree with the accepted answer to this question which claims factory functions are always the "best way to do it".

    If you really want to avoid using it, then the only options are metaclasses or a separate factory function/method (however see Python 3.6+ Update below). Given the choices available, making the __new__() method one — since it's static by default — is a perfectly sensible approach.

    That said, below is what I think is an improved version of your code. I've added a couple of class methods to assist in automatically finding all the subclasses. These support the most important way in which it's better — which is now adding subclasses doesn't require modifying the __new__() method. This means it's now easily extensible since it effectively supports what you could call virtual constructors.

    A similar implementation could also be used to move the creation of instances out of the __new__() method into a separate (static) factory method — so in one sense the technique shown is just a relatively simple way of coding an extensible generic factory function regardless of what name it's given.

    # Works in Python 2 and 3.
    
    import os
    import re
    
    class FileSystem(object):
        class NoAccess(Exception): pass
        class Unknown(Exception): pass
    
        # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
        _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
    
        @classmethod
        def _get_all_subclasses(cls):
            """ Recursive generator of all class' subclasses. """
            for subclass in cls.__subclasses__():
                yield subclass
                for subclass in subclass._get_all_subclasses():
                    yield subclass
    
        @classmethod
        def _get_prefix(cls, s):
            """ Extract any file system prefix at beginning of string s and
                return a lowercase version of it or None when there isn't one.
            """
            match = cls._PATH_PREFIX_PATTERN.match(s)
            return match.group(1).lower() if match else None
    
        def __new__(cls, path):
            """ Create instance of appropriate subclass using path prefix. """
            path_prefix = cls._get_prefix(path)
    
            for subclass in cls._get_all_subclasses():
                if subclass.prefix == path_prefix:
                    # Using "object" base class method avoids recursion here.
                    return object.__new__(subclass)
            else:  # No subclass with matching prefix found (& no default defined)
                raise FileSystem.Unknown(
                    'path "{}" has no known file system prefix'.format(path))
    
        def count_files(self):
            raise NotImplementedError
    
    
    class Nfs(FileSystem):
        prefix = 'nfs'
    
        def __init__ (self, path):
            pass
    
        def count_files(self):
            pass
    
    
    class LocalDrive(FileSystem):
        prefix = None  # Default when no file system prefix is found.
    
        def __init__(self, path):
            if not os.access(path, os.R_OK):
                raise FileSystem.NoAccess('Cannot read directory')
            self.path = path
    
        def count_files(self):
            return sum(os.path.isfile(os.path.join(self.path, filename))
                         for filename in os.listdir(self.path))
    
    
    if __name__ == '__main__':
    
        data1 = FileSystem('nfs://192.168.1.18')
        data2 = FileSystem('c:/')  # Change as necessary for testing.
    
        print(type(data1).__name__)  # -> Nfs
        print(type(data2).__name__)  # -> LocalDrive
    
        print(data2.count_files())  # -> <some number>
    

    Python 3.6+ Update

    The code above works in both Python 2 and 3.x. However in Python 3.6 a new class method was added to object named __init_subclass__() which makes the finding of subclasses simpler by using it to automatically create a "registry" of them instead of potentially having to check every subclass recursively as the _get_all_subclasses() method is doing in the above.

    I got the idea of using __init_subclass__() to do this from the Subclass registration section in the PEP 487 -- Simpler customisation of class creation proposal. Since the method will be inherited by all the base class' subclasses, registration will automatically be done for sub-subclasses, too (as opposed to only to direct subclasses) — it completely eliminates the need for a method like _get_all_subclasses().

    # Requires Python 3.6+
    
    import os
    import re
    
    class FileSystem(object):
        class NoAccess(Exception): pass
        class Unknown(Exception): pass
    
        # Pattern for matching "xxx://"  # x is any non-whitespace character except for ":".
        _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
        _registry = {}  # Registered subclasses.
    
        @classmethod
        def __init_subclass__(cls, /, path_prefix, **kwargs):
            super().__init_subclass__(**kwargs)
            cls._registry[path_prefix] = cls  # Add class to registry.
    
        @classmethod
        def _get_prefix(cls, s):
            """ Extract any file system prefix at beginning of string s and
                return a lowercase version of it or None when there isn't one.
            """
            match = cls._PATH_PREFIX_PATTERN.match(s)
            return match.group(1).lower() if match else None
    
        def __new__(cls, path):
            """ Create instance of appropriate subclass. """
            path_prefix = cls._get_prefix(path)
            subclass = cls._registry.get(path_prefix)
            if subclass:
                return object.__new__(subclass)
            else:  # No subclass with matching prefix found (and no default).
                raise cls.Unknown(
                    f'path "{path}" has no known file system prefix')
    
        def count_files(self):
            raise NotImplementedError
    
    
    class Nfs(FileSystem, path_prefix='nfs'):
        def __init__ (self, path):
            pass
    
        def count_files(self):
            pass
    
    class Ufs(Nfs, path_prefix='ufs'):
        def __init__ (self, path):
            pass
    
        def count_files(self):
            pass
    
    class LocalDrive(FileSystem, path_prefix=None):  # Default file system.
        def __init__(self, path):
            if not os.access(path, os.R_OK):
                raise self.NoAccess(f'Cannot read directory {path!r}')
            self.path = path
    
        def count_files(self):
            return sum(os.path.isfile(os.path.join(self.path, filename))
                         for filename in os.listdir(self.path))
    
    
    if __name__ == '__main__':
    
        data1 = FileSystem('nfs://192.168.1.18')
        data2 = FileSystem('c:/')  # Change as necessary for testing.
        data4 = FileSystem('ufs://192.168.1.18')
    
        print(type(data1))  # -> <class '__main__.Nfs'>
        print(type(data2))  # -> <class '__main__.LocalDrive'>
        print(f'file count: {data2.count_files()}')  # -> file count: <some number>
    
        try:
            data3 = FileSystem('c:/foobar')  # A non-existent directory.
        except FileSystem.NoAccess as exc:
            print(f'{exc} - FileSystem.NoAccess exception raised as expected')
        else:
            raise RuntimeError("Non-existent path should have raised Exception!")
    
        try:
            data4 = FileSystem('foobar://42')  # Unregistered path prefix.
        except FileSystem.Unknown as exc:
            print(f'{exc} - FileSystem.Unknown exception raised as expected')
        else:
            raise RuntimeError("Unregistered path prefix should have raised Exception!")