Search code examples
perlsessionsshnonblockingpoe

OpenSSH POE session stuck


I am doing non-blocking ssh into over 300 radio devices on our network using POE::Component::OpenSSH and a separate POE::Session for every device. The script I have is in working form BUT some of the devices have not been configured with the correct password (we keep same password for all devices for simplicity hence use same password to ssh into every device). Some times the engineers make a mistake and install device with a NON-STANDARD ssh password. In this case if OpenSSH fails to authenticate, then POE::Session gets stuck (_stop never gets called and script never exits). I know its not a pure programmatic problem. We install few devices every day on our network and I cannot count on our engineers to put in the correct password every time. So even if I correct the password on our existing devices, its inevitable to have a devices in future with a wrong password which will cause my process to hang.

I can do $kernel->stop but I dont like that. I would like the session to clear itself. Is there a way to clear the resources/services used by POE::Component::OpenSSH inside the session using a watcher session? Any help is appreciated. Please see my code below.

use strict;
use warnings;
use POE;
use POE::Component::OpenSSH;
use POE::Component::Client::Ping;
use Data::Dumper;

my $domain = shift;
my @args = ($domain);
my $session = POE::Session->create(
        args => \@args,
        inline_states => {
                _start => \&start,
                configcapture => \&configcapture,
                activecapture => \&activecapture,
                detectdfs => \&dfs,
                pong => \&pingresult,
                handlerror => \&handlerror,
                _stop   => \&stop,
        },
);

POE::Kernel->run();
exit;

sub start {
        $_[KERNEL]->sig( DIE => 'sig_DIE' );
        print "STARTING ... for $_[ARG0] \n";
        $_[HEAP]{'domain'} = $domain;

        my $ssh = POE::Component::OpenSSH->new(
                args => ['user@'.$domain.':6022', passwd => '123' ],
        );

        $ssh->capture({event => 'configcapture', timeout => 2},
                      'cat /tmp/system.cfg | grep radio.1.freq');
        $ssh->capture({event => 'activecapture', timeout => 2},
                      '/usr/www/status.cgi');
        $_[KERNEL]->delay(detectdfs => 5);
}

sub configcapture {
        $_[HEAP]{'configfreq'} = $_[ARG0]{'result'}[0];
}

sub activecapture {
        $_[HEAP]{'activefreq'} = $_[ARG0]{'result'}[0];
}

sub dfs {
        if(defined($_[HEAP]{'configfreq'}) &&
           defined($_[HEAP]{'activefreq'})) {
                print "CONFIG: $_[HEAP]{'configfreq'}\n";
                print "ACTIVE: $_[HEAP]{'activefreq'}\n";
        }
}

sub stop {
        my ($self, $output) = @_;
        print "Ending Session Here \n";
}

Solution

  • Thanks for the messages. Found the solution myself though. I had to add the timeout when constructing POE::Component::OpenSSH so that it called the _stop event for every session. But the script was still not exiting.

    Since POE::Component::OpenSSH is using POE::Component::Generic to spawn blocking Net::OpenSSH process as POE session, I just had to call the shutdown method of POE::Component::Generic for openssh component object that were stuck (bcoz of wrong password or whatever) and all of the sessions ended cleanly and the script exited.

    please see the solution below:

    $_[HEAP]{'ssh'} = POE::Component::OpenSSH->new(
                    args => [user@domain, passwd => '123', timeout => 180, async => 1, master_opts => [-o => "StrictHostKeyChecking=no"]],
            );
    

    and then when in _stop

    my ($kernel, $session, $heap) = @_[KERNEL, SESSION, HEAP];
    delete $heap->{wheel};
    $kernel->alias_remove($heap->{alias});
    $kernel->alarm_remove_all();
    $_[HEAP]{'ssh'}->object->shutdown;
    delete($_[HEAP]{'ssh'});