Search code examples
pythonpython-importpython-3.7

Dynamically discover python modules and classes within them


I'm developing a state machine in Python 3.7 that will import "Actions" (defined as classes in their own modules). What I'd like to do is have any application implementing the state machine to be able to call an import_actions() method and pass it the name of the package containing them.

I haven't got that far yet, but I am currently trying to import the core actions needed by the state machine to bootstrap itself and I'm struggling to be honest ...

I have this directory structure:

tree ism/core

ism/core
├── __init__.py
├── action.py
├── action_confirm_ready_to_run.py
├── action_emergency_shutdown.py
├── action_normal_shutdown.py
├── action_process_inbound_messages.py
├── data.json
└── schema.json

My state machine is in ism/ and the actions it's trying to import are in ism/core.

At the moment each action is just an empty class like so:


"""

from ism.core.action import Action


class ActionNormalShutdown(Action):
    pass

What I need to do is:

  1. dynamically discover these files - even though I can see them in this case, because later, when importing third party actions, I won't know what's in the package.

  2. import them, and

  3. discover the name of the class inside them (e.g. ActionNormalShutdown) so that I can instantiate each one and add them to a collection.

After that, the state machine will endlessly loop over the collection and call each one's execute() method.

So each "Action pack" would be a python package and each action would be a class in its own module.

In my init.py for the core package I have this code to dynamically create the all variable:

from os.path import dirname, basename, isfile, join
import glob
modules = glob.glob(join(dirname(__file__), "action*.py"))
__all__ = [ basename(f)[:-3] for f in modules if isfile(f) and not f.endswith('__init__.py')]

And I have this method in my state machine which seems to get all of the modules in the ism.core package:

def __import_core_actions(self):
        """Import the core actions for the ISM"""

        import importlib.util

        core_actions = pkg_resources.contents('ism.core')
        for action in core_actions:
            for file in importlib.import_module('ism.core').modules:
                spec = importlib.util.spec_from_file_location("MyAction", file)
                module = importlib.util.module_from_spec(spec)
                spec.loader.exec_module(module)

Which I think is actually "loading" the modules. I put that in quotes because I'm not sure if loading is the same as importing ...

Another thing I'm unsure of – in the call to importlib.util.spec_from_file_location("MyAction", file)

I use a name of MyAction, mainly because I need a name parameter but I don't know why or if I should then be using it. so any help clarifying that would be greatly appreciated.

So if the method is actually importing the modules, then how can I extend the code to instantiating each class found in the modules and adding each instance to the collection?

Looking ahead, third party developers would likely have their action package installed on the local system and could just pass in the name to my import method, leaving me to discover the content. So I could generalise this method to accept a package name and then call it with 'ism.core' for my own needs.

Can you help me with this dynamic introspection?


Solution

  • I got to the bottom of what I was trying to achieve and for the sake of clarity and anyone hitting this post from a search engine, I've decided to post an answer / explanation to my own question.

    A clearer explanation of what I was trying to achieve:

    I'm developing a state machine that can import "packs" (i.e. packages) of actions that express some specific functionality.

    An 'Action' would be a class that inherits from my BaseAction class and also expresses an 'execute()' method. It should also have the sub string 'Action' in its class name.

    Here is an example package that I'm developing that allow the state machine to react to inbound file based messages appearing in an inbound directory, as well as send file based messages out via an outbound directory:

    ism_comms/
    ├── __init__.py
    ├── file
    │   ├── __init__.py
    │   └── actions
    │       ├── __init__.py
    │       ├── action_io_file_before.py
    │       ├── action_io_file_inbound_.py
    │       ├── action_io_file_outbound.py
    │       ├── data.json
    │       └── schema.json
    

    As you can see, the ism_comms.file package contains three actions:

    1. action_io_file_before.py - create the inbound and outbound directories prior to the state machine switching from STARTING to RUNNING state.
    2. action_io_file_inbound - monitor the inbound directory for messages (json files) and load them into the messages table in the control database.
    3. action_io_file_outbound - monitor the messages table for outbound messages and write them to the outbound directory.

    They are in their own package and also have a schema.json and data.json files.

    schema.json

    {
        "mysql": {
            "tables": [
                "CREATE TABLE IF NOT EXISTS messages ( message_id INTEGER NOT NULL AUTO_INCREMENT COMMENT 'Record ID in recipient messages table', sender TEXT NOT NULL COMMENT 'Return address of sender', sender_id INTEGER NOT NULL COMMENT 'Record ID in sender messages table', recipient TEXT NOT NULL COMMENT 'Address of recipient', action TEXT NOT NULL COMMENT 'Name of the action that handles this message', payload TEXT COMMENT 'Json body of msg payload', sent TEXT NOT NULL COMMENT 'Timestamp msg sent by sender', received TIMESTAMP DEFAULT CURRENT_TIMESTAMP COMMENT 'Time ism loaded message into database', direction TEXT NOT NULL COMMENT 'In or outbound message', processed BOOLEAN NOT NULL DEFAULT '0' COMMENT 'Has the message been processed?', PRIMARY KEY(id) );"
            ]
        },
        "sqlite3": {
            "tables": [
                "CREATE TABLE IF NOT EXISTS messages (\nmessage_id INTEGER NOT NULL PRIMARY KEY, -- Record ID in recipient messages table\nsender TEXT NOT NULL, -- Return address of sender\nsender_id INTEGER NOT NULL, -- Record ID in sender messages table\nrecipient TEXT NOT NULL, -- Address of recipient\naction TEXT NOT NULL, -- Name of the action that handles this message\npayload TEXT, -- Json body of msg payload\nsent TEXT NOT NULL, -- Timestamp msg sent by sender\nreceived TEXT NOT NULL DEFAULT (strftime('%s', 'now')), -- Timestamp ism loaded message into database\ndirection TEXT NOT NULL DEFAULT 'inbound', -- In or outbound message\nprocessed BOOLEAN NOT NULL DEFAULT '0' -- Has the message been processed\n);"
            ]
        }
    }
    

    data.json

    {
        "mysql": {
            "inserts": [
                "INSERT INTO actions VALUES(NULL,'ActionIoFileBefore','STARTING','null',1)",
                "INSERT INTO actions VALUES(NULL,'ActionIoFileInbound','RUNNING','null',0)",
                "INSERT INTO actions VALUES(NULL,'ActionIoFileOutbound','RUNNING','null',0)"
            ]
        },
        "sqlite3": {
            "inserts": [
                "INSERT INTO actions VALUES(NULL,'ActionIoFileBefore','STARTING','null',1)",
                "INSERT INTO actions VALUES(NULL,'ActionIoFileInbound','RUNNING','null',0)",
                "INSERT INTO actions VALUES(NULL,'ActionIoFileOutbound','RUNNING','null',0)"
            ]
        }
    }
    

    What I was trying to achieve:

    In a nutshell, I wanted contributors to be able to import their own action packs simply by passing the package name to the state machine.

    To do this I needed to use introspection and wasn't familiar with how to do so in Python. Hence my posting the question above. Now that I've worked through how to achieve this using importlib, I'll post the code here.

    Any comments on how to make this more 'Pythonic' are most welcome as I'm relatively new to the language.

    The import method:

    def import_action_pack(self, pack):
            """Import an action pack
    
            Application can pass in action packs to enable the ISM to express
            specific functionality. For an example of how to call this method,
            see the unit test in tests/test_ism.py (test_import_action_pack).
    
            Each action pack is a python package containing:
                * At least one action class inheriting from ism.core.BaseAction
                * A data.json file containing at least the insert statements for the
                action in the control DB.
                * Optionally a schema.json file contain the create statements for any
                tables the action needs in the control DB.
    
                The package should contain nothing else and no sub packages.
            """
            import pkgutil
            action_args = {
                "dao": self.dao,
                "properties": self.properties
            }
    
            try:
                # Import the package containing the actions
                package = importlib.import_module(pack)
                # Find each action module in the package
                for importer, modname, ispkg in pkgutil.iter_modules(package.__path__):
                    # Should not be any sub packages in there
                    if ispkg:
                        raise MalformedActionPack(
                            f'Passed malformed action pack ({pack}). Unexpected sub packages {modname}'
                        )
                    # Import the module containing the action
                    module = importlib.import_module(f'{pack}.{importer.find_spec(modname).name}')
                    # Get the name of the action class, instantiate it and add to the collection of actions
                    for action in inspect.getmembers(module, inspect.isclass):
                        if action[0] == 'BaseAction':
                            continue
                        if 'Action' in action[0]:
                            cl_ = getattr(module, action[0])
                            self.actions.append(cl_(action_args))
    
                # Get the supporting DB file/s
                self.import_action_pack_tables(package)
    
            except ModuleNotFoundError as e:
                logging.error(f'Module/s not found for argument ({pack})')
                raise
    

    The importing of the schema and SQL:

    def import_action_pack_tables(self, package):
            """"An action will typically create some tables and insert standing data.
    
            If supporting schema file exists, then create the tables. A data.json
             file must exist with at least one insert for the actions table or the action
             execute method will not be able to activate or deactivate..
            """
    
            inserts_found = False
            path = os.path.split(package.__file__)[0]
            for root, dirs, files in os.walk(path):
                if 'schema.json' in files:
                    schema_file = os.path.join(root, 'schema.json')
                    with open(schema_file) as tables:
                        data = json.load(tables)
                        for table in data[self.properties['database']['rdbms'].lower()]['tables']:
                            self.dao.execute_sql_statement(table)
    
                if 'data.json' in files:
                    data = os.path.join(root, 'data.json')
                    with open(data) as statements:
                        inserts = json.load(statements)
                        for insert in inserts[self.properties['database']['rdbms'].lower()]['inserts']:
                            self.dao.execute_sql_statement(insert)
                    inserts_found = True
    
                if not inserts_found:
                    raise MalformedActionPack(f'No insert statements found for action pack ({package})')