Search code examples
xmlperlmodulelibxml2xml-libxml

best practices when using Perl modules


i'm basically new to modules and i'm trying to use them in my scripts. i am having trouble finding the right way of using them properly and i'd like your advice about it.

let me explain quickly what i'm trying to do :

my script is doing some file transfers, based on data from XML files.

so basically, i have XML files with contents like that :

<fftg>
    <actions>

            <!-- Rename file(s) -->
            <rename>
                    <mandatory>0</mandatory>
                    <file name="foo" to="bar" />
            </rename>

            <!-- Transfer file(s) -->
            <transfer>
                    <mandatory>0</mandatory>
                    <protocol>SFTP</protocol>
                    <server>fqdn</server>
                    <port>22</port>
                    <file name="bar" remotefolder="toto" />
            </transfer>

            <!-- Transfer file(s) -->
            <transfer>
                    <mandatory>0</mandatory>
                    <protocol>SFTP</protocol>
                    <server>fqdn</server>
                    <port>22</port>
                    <file name="blabla" remotefolder="xxxx" />
                    <file name="blabla2" remotefolder="xxxx" />
            </transfer>

    </actions>
</fftg>

in a few words, i have a script performing "actions". every action can be repeated X times.

now, instead of having an important script with a bunch of subroutines etc.. i think it should be better to create modules for my app, and put the actions in modules.

for example :

FFTG::Rename
FFTG::Transfer
FFTG::Transfer::SFTP
FFTG::Transfer::FTP

& so on (i've created all these modules and they work fine independently)

and call these modules depending on the actions specified in the XML file. people could create new modules/actions if required (i want the thing to be modular).

now, i don't know how to do this properly.

so my question is : what is the best way to do this please ?

currently, my script is reading these actions like that :

# Load XML file
my $parser = XML::LibXML->new();
my $doc    = $parser->parse_file($FFTG_TSF . "/" . $tid . ".xml");

# Browse XML file
foreach my $transfer ($doc->findnodes('/fftg')) {

    # Grab generic information
    my($env) = $transfer->findnodes('./environment');
    my($desc) = $transfer->findnodes('./description');
    my($user) = $transfer->findnodes('./user');
    print $env->to_literal, "\n";

    # Browse Actions
    foreach my $action ($doc->findnodes('/fftg/actions/*')) {

            my $actiontype = ucfirst($action->nodeName());
            # how do i select a module from the $actiontype here ?     ($actiontype = Rename or Transfer)
            # i can't do : use FFTG::$actiontype::execaction(); or something for example, it doesnt work
            # and is it the right way of doing it ? 

    }
}

but maybe it's not the right way of thinking it. (i'm using Lib::LibXML) how can i call the module "dynamically" (using a variable in the name, such as FFTG::$actiontype for example and also, does it mean that i have to have the same subroutine in every module ? example : sub execaction

as i want to send differnt data to the module......

any hints ? thanks again regards,


Solution

  • First, you need to come up with a clear interface. Every module needs to have the same structure. It doesn't matter if it's OOP or not, but they all need to expose the same interface.

    Here's an example not-OOp implementation of FFTG::Rename. I've left out a lot of stuff, but I think it's clear what is happening.

    package FFTG::Rename;
    
    use strict;
    use warnings;
    
    sub run {
        my ($args) = @_;
    
        if ($args->{mandatory}) {
            # do stuff here
        }
    
        # checks args...
        # do sanity checks...
        return unless -f $args->{file}->{name}; # or whatever...
    
        rename $args->{file}->{name}, $args->{file}->{to} or die $!;
    
        return; # maybe return someting meaningful?
    }
    

    Now let's assume we have a bunch of those. How do we load them? There are several ways to do this. I have omitted the part of getting the arguments into the run function. You'll need to take the stuff from the XML and pass it along in a way that's identical to all of those functions, but I think that's not relevant to the question.

    Load all of them

    The most obvious is to load all of them in your script manually.

    #!/usr/bin/env perl
    use strict;
    use warnings;
    use XML::LibXML;
    
    # load FFTG modules
    use FFTG::Rename;
    # ...
    

    Once they are loaded, you can call the function. The exist keyword is handy because it can also be used to check if a function exists.

    foreach my $action ( $doc->findnodes('/fftg/actions/*') ) {
        my $actiontype = ucfirst( $action->nodeName );
        no strict 'refs';
        if ( exists &{"FFTG::${actiontype}::run"} ) {
            &{"FFTG::${actiontype}::run"}->( $parsed_node_information );
        } else {
            # this module was not loaded
        }
    }
    

    Unfortunately the non-OO approach requires the no strict 'refs', which is not pretty. It's probably better to do it in an object-oriented fashion. But I'll stick with this for the answer.

    The clear downside of this way is that you need to load all of the modules all the time, and whenever a new one is created, it needs to be added. This is the least complex, way, but also has the highest maintenance.

    Automatic loading with a lookup table

    Another way is to use automatic loading and a lookup table that defines actions that are allowed. If you want your program to only load the modules on demand because you know that you don't need all of them in every invocation, but you also want to have control over what gets loaded, this makes sense.

    Instead of loading all of them, the loading can be outsourced to Module::Runtime.

    use Module::Runtime 'require_module';
    use Try::Tiny;
    
    my %modules = (
        'rename' => 'FFTG::Rename',
    
        # ...
    );
    
    foreach my $action ( $doc->findnodes('/fftg/actions/*') ) {
        try {
            no strict 'refs';
            require_module $modules{$action};
            &{"FFTG::${actiontype}::run"}->($parsed_node_information);
        }
        catch {
            # something went wrong
            # maybe the module does not exist or it's not listed in the lookup table
            warn $_;
        };
    }
    

    I've also added Try::Tiny to take care of error handling. It gives you control over what to do when stuff goes wrong.

    This approach lets you control what actions are allowed, which is good if you're paranoid. But it still requires you to maintain the script and add new modules to the %modules lookup table.

    Trusting and loading dynamically

    A third, most generic approach would be to use Module::Runtime to load stuff dynamically without the lookup table.

    use Module::Runtime 'require_module';
    use Try::Tiny;
    
    foreach my $action ( $doc->findnodes('/fftg/actions/*') ) {
        try {
            my $actiontype = ucfirst($action->nodeName);
            require_module "FFTG::${actiontype}";
    
            no strict 'refs';
            &{"FFTG::${actiontype}::run"}->($parsed_node_information);
        }
        catch {
            # something went wrong
            # the module does not exist
        };
    }
    

    This has the least maintenance, but it's a bit more dangerous. You don't know what data is coming in, and now there is no sanity check. I can't think of a way to exploit this of the top of my head, but there could be one. Still, now no editing the script and keeping a module list up to date is required.

    Conclusion

    I would probably go with the second approach. It gives you control and still keeps stuff dynamic. I would not go with the non-OOP approach I have used.

    You could keep it non-OOP and still get rid of the no strict 'refs' by using the -> object notation to call class methods. Then your package would look like this.

    package FFTG::Rename;
    
    use strict;
    use warnings;
    
    sub run {
        my (undef, $args) = @_;
    
        # ...
    }
    

    The undef is to not capture $class (not $self), because we don't need it. Or maybe we do, for logging. It depends. But with this, you could essentially call the class method as follows for the lookup table solution.

    require_module $modules{$action};
    $modules{$action}->run($parsed_node_information);
    

    This is obviously way clearer and preferable.