Search code examples
cloudubuntu-9.10hpcpvm

PVM terminates after Adding Host


On Ubuntu 9.10 using PVM 3.4.5-12 (the PVM package when you use apt-get) The program terminates after adding a host.

laptop> pvm
pvm> add bowtie-slave
add bowtie-slave
terminated
laptop>

Current Configuration only $PVM_RSH = bin/usr/ssh
I can ssh perfectly fine into the slave without a password, and run commands on it.

Any ideas?
Thanks in advance!

Here are the sample logs:

Laptop log

[t80040000] 02/11 10:23:32 laptop (127.0.1.1:xxxxx) LINUX 3.4.5
[t80040000] 02/11 10:23:32 ready Thu Feb 11 10:23:32 2010
[t80040000] 02/11 10:23:32 netoutput() sendto: errno=22
[t80040000] 02/11 10:23:32 em=0x2c24f0
[t80040000] 02/11 10:23:32 [49/à][6e/à][76/à][61/à][6c/à][69/à][64/à][20/à][61/à][72/à]
[t80040000] 02/11 10:23:32 netoutput() sendto: Invalid argument
[t80040000] 02/11 10:23:32 pvmbailout(0)

bowtie-log

[t80080000] 02/11 10:23:25 bowtie-slave (xxx.x.x.xxx:xxxxx) LINUX64 3.4.5
[t80080000] 02/11 10:23:25 ready Thu Feb 11 10:23:25 2010
[t80080000] 02/11 10:28:26 work() run = STARTUP, timed out waiting for master
[t80080000] 02/11 10:28:26 pvmbailout(0)


Solution

  • I've also been struggling with this problem. I just found a couple of the things that were failing for me.

    First, my master host was starting with a node-name that was not recognized by the slave host. That is, it was calling itself "foobar" but it really should have been "foobar.example.com" so that the slave knew how to talk to it. You specify this by starting the master console like this:

    pvm -nfoobar.example.com
    

    I also specified the full name of the slave. So in the console:

    add baz.mumble.example.com
    

    Then I had a problem where the console would hang when I added the slave. Hey, at least it's not just stopping! I found out that this is because of the firewall on the slave host---the communications were getting dropped (the pvmd's don't communicate over ssh after setup, they have another port that they talk over). Unfortunately, running without a firewall is not an option for that host. By default, pvmd picks a random port number, which is not what I want. Apparently there's an undocumented environment variable, PVMNETSOCKPORT, that controls what ports it uses. Right now I'm working on getting that correctly set so that I can poke the correct hole in my firewall.

    Good luck! I'll try and update this answer if I get any farther.