Search code examples
pythonoopdependency-injectionmixins

Most appropriate way to combine features of a class to another?


Hey guys I'm new here but hope my question is clear.

My code is written in Python. I have a base class representing a general website, this class holds some basic methods to fetch the data from the website and save it. That class is extended by many many other classes each representing a different website each holding attributes specific to that website, each subclass uses the base class methods to fetch the data. All sites should have the data parsed on them but many sites share the same parsing functionality . So I created several parsing classes that hold the functionality and properties for the different parsing methods (I have about six) . I started to think what would be the best way to integrate those classes with the website classes that need them.

At first I thought that each website class would hold a class variable with the parser class that corresponds to it but then I thought there must be some better way to do it.

I read a bit and thought I might be better off relying on Mixins to integrate the parsers for each website but then I thought that though that would work it doesn't "sound" right since the website class has no business inheriting from the parser class (even thought it is only a Mixin and not meant to be a full on class inheritance) since they aren't related in any way except that the website uses the parser functionality.

Then I thought I might rely on some dependency injection code I saw for python to inject the parser to each website but it sounded a bit of an overkill.

So I guess my question basically is, when is it best to use each case (in my project and in any other project really) since they all do the job but don't seem to be the best fit.

Thank you for any help you may offer, I hope I was clear.

Adding a small mock example to illustrate:

class BaseWebsite():
    def fetch(): # Shared by all subclasses websites
       ....
    def save(): # Shared by all subclasses websites
       ....

class FirstWebsite(BaseWebsite): # Uses parsing method one
    ....
class SecondWebsite(BaseWebsite): # Uses parsing method one
    ....
class ThirdWebsite(BaseWebsite): # Uses parsing method two
    ....

and so forth


Solution

  • I think your problem is that you're using subclasses where you should be using instances.

    From your description, there's one class for each website, with a bunch of attributes. Presumably you create singleton instances of each of the classes. There's rarely a good reason to do this in Python. If each website needs different data—a base URL, a parser object/factory/function, etc.—you can just store it in instance attributes, so each website can be an instance of the same class.

    If the websites actually need to, say, override base class methods in different ways, then it makes sense for them to be different classes (although even there, you should consider whether moving that functionality into external functions or objects that can be used by the websites, as you already have with the parser). But if not, there's no good reason to do this.

    Of course I could be wrong here, but the fact that you defined old-style classes, left the self parameter out of your methods, talked about class attributes, and generally used Java terminology instead of Python terminology makes me think that this mistake isn't too unlikely.

    In other words, what you want is:

    class Website:
        def __init__(self, parser, spam, eggs):
            self.parser = parser
            # ...
        def fetch(self):
            data = # ...
            soup = self.parser(data)
            # ...
    
    first_website = Website(parser_one, urls[0], 23)
    second_website = Website(parser_one, urls[1], 42)
    third_website = Website(parser_two, urls[2], 69105)
    

    Let's say you have 20 websites. If you're creating 20 subclasses, you're writing half a dozen lines of boilerplate for each, and there's a whole lot you can get wrong with the details which may be painful to debug. If you're creating 20 instances, it's just a few characters of boilerplate, and a lot less to get wrong:

    websites = [Website(parser_one, urls[0], 23),
                Website(parser_two, urls[1], 42),
                # ...
               ]
    

    Or you can even move the data to a data file. For example, a CSV like this:

    url,parser,spam
    http://example.com/foo,parser_one,23
    http://example.com/bar,parser_two,42
    …
    

    You can edit this more easily—or even use a spreadsheet program to do it—with no need for any extraneous typing. And you can import it into Python with a couple lines of code:

    with open('websites.csv') as f:
        websites = [Website(**row) for row in csv.DictReader(f)]