Search code examples
certificatepuppethostspuppet-enterprise

log file shows intermittent success and failure


I will try a brief version first, then i can add more information as requested.

I have a client machine with the following configuration:

------------------------------------------------------------
Connected to puppet-client-10 as root
Debian 7.8 wheezy (amd64)
------------------------------------------------------------
FQDN        : puppet-client-10.mydomain
IP          : 161.148.1.10

PuppetMaster: puppet-master.mydomain
Puppet      : 3.7.5
Facter      : 2.2.0
------------------------------------------------------------

Connecting to the below puppetmaster:

------------------------------------------------------------
Connected to puppet-master as root
Debian 7.8 wheezy (amd64)
------------------------------------------------------------
FQDN        : puppet-master.mydomain
IP          : 161.148.1.1

Puppet      : 3.7.5
Facter      : 2.4.3
------------------------------------------------------------

Now, back to the client. I used to have the agent disabled, and checking updates via cron once a day.

6 22 * * * root /usr/bin/puppet agent --test --logdest syslog

Works flawlessly.

2 days ago i commented the cron job and enabled the agent to check for updates every hour.

Then, the logs started showing this line every 2 minutes

<27>1 2015-05-20T08:20:30.651767-03:00 puppet-client-10 puppet-agent 8072 - -  Could not request certificate: getaddrinfo: Name or service not known
<27>1 2015-05-20T08:22:30.668988-03:00 puppet-client-10 puppet-agent 8072 - -  Could not request certificate: getaddrinfo: Name or service not known

Also, is showing that the client is correctly checking the master for updates

<28>1 2015-05-20T08:23:44.927447-03:00 puppet-client-10 puppet-agent 31500 - -  Loading class elasticsearch
<28>1 2015-05-20T08:23:45.406158-03:00 puppet-client-10 puppet-agent 31500 - -  Loading class logstash
<28>1 2015-05-20T08:23:45.776948-03:00 puppet-client-10 puppet-agent 31500 - -  Loading class logrotate
<28>1 2015-05-20T08:23:46.204161-03:00 puppet-client-10 puppet-agent 31500 - -  Loading class puppet

And then, back to the getaddrinfo error every 2 minutes

<27>1 2015-05-20T08:24:30.676307-03:00 puppet-client-10 puppet-agent 8072 - -  Could not request certificate: getaddrinfo: Name or service not known
<27>1 2015-05-20T08:26:30.683570-03:00 puppet-client-10 puppet-agent 8072 - -  Could not request certificate: getaddrinfo: Name or service not known

It keeps alternating between the error (every 2 minutes) and success (every hour) messages.

Executing the command puppet agent --test works, as expected.
The problem seems to be on the agent.

Any hints?


i would guess it is because your puppet master isn't named "puppet". Also I'd check what user the puppet agent you now have running is running as, probably not root I'd guess – Vorsprung

it is named puppet-master, also puppet-master.mydomain, and with the below alt names

# puppet cert list puppet-master.mydomain  

+ "puppet-master.mydomain" (SHA256) F2:54:03:9C 
  (alt names: "DNS:puppet", "DNS:puppet.mydomain", "DNS:puppet-master.mydomain")  

It is running as root

# ps aux | grep puppet

root      1763  0.0  0.2 133776 45236 ?        Ssl  Mai19   0:07 /usr/bin/ruby /usr/bin/puppet agent
root      8072  0.0  0.2 194580 40144 ?        Ssl  Mai19   0:02 /usr/bin/ruby /usr/bin/puppet agent

Right now, 8072 above is the process spamming the error line.

Should i really have 2 processes running?


Solution

  • The error indicates an issue resolving a hostname to an IP, but given it succeeds every hour and also succeeds manually I don't think you have any configuration issues with your name resolution.

    You should only have a single puppet-agent process running, I would stop the puppet-agent service, ensure that all of the processes have been killed, restart the puppet-agent service and ensure that only one process is running.

    My bet is on one of those processes doing something silly.