Search code examples
firewallmunin

Munin unable to load graphs. socket error


-bash-4.1$ /usr/share/munin/munin-update --nofork --debug -host ec2-184-77-97-201.compute-1.amazonaws.com
2013/11/17 03:35:22 [DEBUG] Creating new lock file /var/run/munin/munin-update.lock
2013/11/17 03:35:22 [DEBUG] Creating lock : /var/run/munin/munin-update.lock succeeded
2013/11/17 03:35:22 [INFO]: Starting munin-update
2013/11/17 03:35:22 [DEBUG] Lock /var/run/munin/munin-compute-1.amazonaws.com-ec2-184-77-97-201.compute-1.amazonaws.com.lock already exists, checking process
2013/11/17 03:35:22 [DEBUG] Lock contained pid '12147'
2013/11/17 03:35:22 [INFO] Process 12147 is dead, stealing lock, removing file
2013/11/17 03:35:22 [DEBUG] Creating new lock file /var/run/munin/munin-compute-1.amazonaws.com-ec2-184-77-97-201.compute-1.amazonaws.com.lock
2013/11/17 03:35:22 [DEBUG] Creating lock : /var/run/munin/munin-compute-1.amazonaws.com-ec2-184-77-97-201.compute-1.amazonaws.com.lock succeeded
2013/11/17 03:35:22 [DEBUG] Reading state for compute-1.amazonaws.com-ec2-184-77-97-201.compute-1.amazonaws.com in /var/lib/munin/state-compute-1.amazonaws.com-ec2-184-77-97-201.compute-1.amazonaws.com.storable
2013/11/17 03:35:22 [INFO] starting work in 12544 for ec2-184-77-97-201.compute-1.amazonaws.com/ec2-184-77-97-201.compute-1.amazonaws.com:4949.
2013/11/17 03:35:22 [FATAL] Socket read from ec2-184-77-97-201.compute-1.amazonaws.com failed.  Terminating process. at /usr/share/perl5/vendor_perl/Munin/Master/UpdateWorker.pm line 254
2013/11/17 03:35:22 [WARNING] Failed worker compute-1.amazonaws.com;ec2-184-73-97-201.compute-1.amazonaws.com
2013/11/17 03:35:22 [DEBUG] Creating new lock file /var/run/munin/munin-datafile.lock
2013/11/17 03:35:22 [DEBUG] Creating lock : /var/run/munin/munin-datafile.lock succeeded
2013/11/17 03:35:22 [INFO] No old data available for failed worker compute-1.amazonaws.com;ec2-184-77-97-201.compute-1.amazonaws.com.  This node will disappear from the html web page hierarchy
2013/11/17 03:35:22 [DEBUG] Writing state to /var/lib/munin/datafile.storable
2013/11/17 03:35:22 [INFO]: Munin-update finished (0.06 sec)

Server pkgs:

munin-common-2.0.17-1.30.amzn1.noarch munin-node-2.0.17-1.30.amzn1.noarch munin-2.0.17-1.30.amzn1.noarch

Client:

munin-common-2.0.17-1.30.amzn1.noarch munin-node-2.0.17-1.30.amzn1.noarch

Telnet test:

[root@nagios ec2-user]# telnet ec2-184-77-97-201.compute-1.amazonaws.com 4949 Trying 10.73.133.164... Connected to ec2-184-77-97-201.compute-1.amazonaws.com. Escape character is '^]'. Connection closed by foreign host. [root@nagios ec2-user]#

Iptables disabled on client/server.

Telnet localhost looks fine :

localhost test

[root@nagios ec2-user]# telnet localhost 4949 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'.

munin node at nagios.xyz.com

Munin client node error:

2013/11/17-03:38:03 CONNECT TCP Peer: "10.202.30.240:54208" Local: "10.73.133.164:4949"
2013/11/17-03:38:03 [1906] Denying connection from: 10.202.30.240
2013/11/17-03:40:02 CONNECT TCP Peer: "10.202.30.240:45094" Local: "10.73.133.164:4949"
2013/11/17-03:40:02 [1909] Denying connection from: 10.202.30.240
~                                                                  

Can someone help me here..


Solution

  • My regex on client (munin-node.conf) messed up.

    Add munin server IP to client munin-node.conf

    allow 10.202.30.210
    
    restart munin-node service.
    

    It worked this time

    [root@nagios ec2-user]# telnet ec2-184-73-97-201.compute-1.amazonaws.com 4949
    Trying 10.73.133.164...
    Connected to ec2-184-73-97-201.compute-1.amazonaws.com.
    Escape character is '^]'.
    # munin node at ip-10-73-133-164.ec2.internal
    nodes
    ip-10-73-133-164.ec2.internal
    .
    exit