Search code examples
centos7hyper-vambari

Ambari Confirm hosts Step fails: Registration with the server failed


I want to Ambari to build a platform for testing some functionalities on spark. I use Win 10+Hyper-V to create two VMs (mercury.gc and venus.gc) installed with CentOS 7. Ambari 2.2.2.0 is installed on one VM (mercury.gc) and try to use it to config these two VMs. When running confirm hosts, the progress of both returns failed. Following are the logs from one machine:

Connection to mercury.gc closed.
SSH command execution finished
host=mercury.gc, exitcode=0
Command end time 2016-07-18 00:29:58

Registering with the server...
Registration with the server failed.

I have checked the passwordless ssh login works properly, and the firewall and selinux has been turned off. I cannot figure out what happens from the logs. Is there anyone can help me solve this problem?

Following is the complete log given by ambari:

==========================
Creating target directory...
==========================

Command start time 2016-07-18 00:29:51

Connection to mercury.gc closed.
SSH command execution finished
host=mercury.gc, exitcode=0
Command end time 2016-07-18 00:29:51

==========================
Copying common functions script...
==========================

Command start time 2016-07-18 00:29:51

scp /usr/lib/python2.6/site-packages/ambari_commons
host=mercury.gc, exitcode=0
Command end time 2016-07-18 00:29:52

==========================
Copying OS type check script...
==========================

Command start time 2016-07-18 00:29:52

scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py
host=mercury.gc, exitcode=0
Command end time 2016-07-18 00:29:52

==========================
Running OS type check...
==========================

Command start time 2016-07-18 00:29:52
Cluster primary/cluster OS family is redhat7 and local/current OS family is redhat7

Connection to mercury.gc closed.
SSH command execution finished
host=mercury.gc, exitcode=0
Command end time 2016-07-18 00:29:52

==========================
Checking 'sudo' package on remote host...
==========================

Command start time 2016-07-18 00:29:52
sudo-1.8.6p7-17.el7_2.x86_64

Connection to mercury.gc closed.
SSH command execution finished
host=mercury.gc, exitcode=0
Command end time 2016-07-18 00:29:53

==========================
Copying repo file to 'tmp' folder...
==========================

Command start time 2016-07-18 00:29:53

scp /etc/yum.repos.d/ambari.repo
host=mercury.gc, exitcode=0
Command end time 2016-07-18 00:29:53

==========================
Moving file to repo dir...
==========================

Command start time 2016-07-18 00:29:53

Connection to mercury.gc closed.
SSH command execution finished
host=mercury.gc, exitcode=0
Command end time 2016-07-18 00:29:53

==========================
Changing permissions for ambari.repo...
==========================

Command start time 2016-07-18 00:29:53

Connection to mercury.gc closed.
SSH command execution finished
host=mercury.gc, exitcode=0
Command end time 2016-07-18 00:29:54

==========================
Copying setup script file...
==========================

Command start time 2016-07-18 00:29:54

scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py
host=mercury.gc, exitcode=0
Command end time 2016-07-18 00:29:54

==========================
Running setup agent script...
==========================

Command start time 2016-07-18 00:29:54
("INFO 2016-07-18 00:16:30,697 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads
WARNING 2016-07-18 00:16:30,697 AlertSchedulerHandler.py:246 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs.
INFO 2016-07-18 00:16:30,697 AlertSchedulerHandler.py:142 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x1de0ad0>; currently running: False
INFO 2016-07-18 00:16:32,701 hostname.py:89 - Read public hostname 'mercury.gc' using socket.getfqdn()
INFO 2016-07-18 00:16:32,805 ExitHelper.py:53 - Performing cleanup before exiting...
INFO 2016-07-18 00:16:32,805 ExitHelper.py:67 - Cleanup finished, exiting with code:0
INFO 2016-07-18 00:29:55,955 main.py:74 - loglevel=logging.INFO
INFO 2016-07-18 00:29:55,957 main.py:74 - loglevel=logging.INFO
INFO 2016-07-18 00:29:55,958 DataCleaner.py:39 - Data cleanup thread started
INFO 2016-07-18 00:29:55,961 DataCleaner.py:120 - Data cleanup started
INFO 2016-07-18 00:29:55,961 DataCleaner.py:122 - Data cleanup finished
INFO 2016-07-18 00:29:56,012 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2016-07-18 00:29:56,013 main.py:289 - Connecting to Ambari server at https://mercury.gc:8440 (192.168.137.100)
INFO 2016-07-18 00:29:56,013 NetUtil.py:60 - Connecting to https://mercury.gc:8440/ca
INFO 2016-07-18 00:29:56,099 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads
WARNING 2016-07-18 00:29:56,100 AlertSchedulerHandler.py:246 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs.
INFO 2016-07-18 00:29:56,100 AlertSchedulerHandler.py:142 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x10e6ad0>; currently running: False
INFO 2016-07-18 00:29:58,103 hostname.py:89 - Read public hostname 'mercury.gc' using socket.getfqdn()
INFO 2016-07-18 00:29:58,207 ExitHelper.py:53 - Performing cleanup before exiting...
INFO 2016-07-18 00:29:58,207 ExitHelper.py:67 - Cleanup finished, exiting with code:0
", None)
("INFO 2016-07-18 00:16:30,697 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads
WARNING 2016-07-18 00:16:30,697 AlertSchedulerHandler.py:246 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs.
INFO 2016-07-18 00:16:30,697 AlertSchedulerHandler.py:142 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x1de0ad0>; currently running: False
INFO 2016-07-18 00:16:32,701 hostname.py:89 - Read public hostname 'mercury.gc' using socket.getfqdn()
INFO 2016-07-18 00:16:32,805 ExitHelper.py:53 - Performing cleanup before exiting...
INFO 2016-07-18 00:16:32,805 ExitHelper.py:67 - Cleanup finished, exiting with code:0
INFO 2016-07-18 00:29:55,955 main.py:74 - loglevel=logging.INFO
INFO 2016-07-18 00:29:55,957 main.py:74 - loglevel=logging.INFO
INFO 2016-07-18 00:29:55,958 DataCleaner.py:39 - Data cleanup thread started
INFO 2016-07-18 00:29:55,961 DataCleaner.py:120 - Data cleanup started
INFO 2016-07-18 00:29:55,961 DataCleaner.py:122 - Data cleanup finished
INFO 2016-07-18 00:29:56,012 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2016-07-18 00:29:56,013 main.py:289 - Connecting to Ambari server at https://mercury.gc:8440 (192.168.137.100)
INFO 2016-07-18 00:29:56,013 NetUtil.py:60 - Connecting to https://mercury.gc:8440/ca
INFO 2016-07-18 00:29:56,099 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads
WARNING 2016-07-18 00:29:56,100 AlertSchedulerHandler.py:246 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs.
INFO 2016-07-18 00:29:56,100 AlertSchedulerHandler.py:142 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x10e6ad0>; currently running: False
INFO 2016-07-18 00:29:58,103 hostname.py:89 - Read public hostname 'mercury.gc' using socket.getfqdn()
INFO 2016-07-18 00:29:58,207 ExitHelper.py:53 - Performing cleanup before exiting...
INFO 2016-07-18 00:29:58,207 ExitHelper.py:67 - Cleanup finished, exiting with code:0
", None)

Connection to mercury.gc closed.
SSH command execution finished
host=mercury.gc, exitcode=0
Command end time 2016-07-18 00:29:58

Registering with the server...
Registration with the server failed.

UPDATE: Following is my ambari-agent.log from mercury.gc

INFO 2016-07-19 20:35:33,732 main.py:74 - loglevel=logging.INFO
INFO 2016-07-19 20:35:33,732 main.py:74 - loglevel=logging.INFO
INFO 2016-07-19 20:35:33,733 DataCleaner.py:39 - Data cleanup thread started
INFO 2016-07-19 20:35:33,734 DataCleaner.py:120 - Data cleanup started
INFO 2016-07-19 20:35:33,734 DataCleaner.py:122 - Data cleanup finished
INFO 2016-07-19 20:35:33,775 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2016-07-19 20:35:33,776 main.py:289 - Connecting to Ambari server at https://mercury.gc:8440 (192.168.137.100)
INFO 2016-07-19 20:35:33,776 NetUtil.py:60 - Connecting to https://mercury.gc:8440/ca
INFO 2016-07-19 20:35:33,870 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads
WARNING 2016-07-19 20:35:33,870 AlertSchedulerHandler.py:246 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs.
INFO 2016-07-19 20:35:33,870 AlertSchedulerHandler.py:142 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x1a0aad0>; currently running: False
INFO 2016-07-19 20:35:35,874 hostname.py:89 - Read public hostname 'mercury.gc' using socket.getfqdn()
INFO 2016-07-19 20:35:35,999 ExitHelper.py:53 - Performing cleanup before exiting...
INFO 2016-07-19 20:35:36,000 ExitHelper.py:67 - Cleanup finished, exiting with code:0

And Following is the /etc/hosts on both hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.137.100 mercury.gc mercury
192.168.137.101 venus.gc venus

This is my ambari.properties in /etc/ambari-server/conf/

jdk1.7.dest-file=jdk-7u67-linux-x64.tar.gz
kerberos.keytab.cache.dir=/var/lib/ambari-server/data/cache
views.request.read.timeout.millis=10000
agent.package.install.task.timeout=1800
server.connection.max.idle.millis=900000
bootstrap.script=/usr/lib/python2.6/site-packages/ambari_server/bootstrap.py
server.version.file=/var/lib/ambari-server/resources/version
recovery.type=AUTO_START
api.authenticate=true
http.strict-transport-security=max-age=31536000
server.persistence.type=local
jdk1.8.jcpol-url=http://public-repo-1.hortonworks.com/ARTIFACTS/jce_policy-8.zip
jdk1.8.dest-file=jdk-8u60-linux-x64.tar.gz
rolling.upgrade.skip.packages.prefixes=
common.services.path=/var/lib/ambari-server/resources/common-services
http.x-frame-options=DENY
server.task.timeout=1200
jce.download.supported=true
agent.threadpool.size.max=25
recovery.lifetime_max_count=1024
jdk1.8.re=(jdk.*)/jre
ambari.python.wrap=ambari-python-wrap
ambari-server.user=root
agent.task.timeout=900
jdk1.7.url=http://public-repo-1.hortonworks.com/ARTIFACTS/jdk-7u67-linux-x64.tar.gz
server.jdbc.user.name=ambari
server.os_family=redhat7
java.home=/usr/java/jdk1.8.0_92/
server.jdbc.postgres.schema=ambari
jdk.name=jdk-8u60-linux-x64.tar.gz
user.inactivity.timeout.default=0
java.releases=jdk1.8,jdk1.7
skip.service.checks=false
shared.resources.dir=/usr/lib/ambari-server/lib/ambari_commons/resources
jdk.download.supported=true
recommendations.dir=/var/run/ambari-server/stack-recommendations
ulimit.open.files=10000
agent.stack.retry.tries=5

rolling.upgrade.min.stack=HDP-2.2
jdk1.8.desc=Oracle JDK 1.8 + Java Cryptography Extension (JCE) Policy Files 8
server.os_type=centos7
views.http.strict-transport-security=max-age=31536000
views.ambari.request.connect.timeout.millis=5000
views.request.connect.timeout.millis=5000
resources.dir=/var/lib/ambari-server/resources
custom.action.definitions=/var/lib/ambari-server/resources/custom_action_definitions
views.http.x-frame-options=SAMEORIGIN
recovery.enabled_components=METRICS_COLLECTOR
jdk1.7.re=(jdk.*)/jre
server.execution.scheduler.maxDbConnections=5
jdk1.7.desc=Oracle JDK 1.7 + Java Cryptography Extension (JCE) Policy Files 7
agent.stack.retry.on_repo_unavailability=false
views.ambari.request.read.timeout.millis=10000
jdk1.8.jcpol-file=jce_policy-8.zip
rolling.upgrade.max.stack=
server.http.session.inactive_timeout=1800
jdk1.7.jcpol-file=UnlimitedJCEPolicyJDK7.zip
server.execution.scheduler.misfire.toleration.minutes=480
security.server.keys_dir=/var/lib/ambari-server/keys
stackadvisor.script=/var/lib/ambari-server/resources/scripts/stack_advisor.py
server.tmp.dir=/var/lib/ambari-server/data/tmp
server.execution.scheduler.maxThreads=5
metadata.path=/var/lib/ambari-server/resources/stacks
server.fqdn.service.url=http://169.254.169.254/latest/meta-data/public-hostname
views.http.x-xss-protection=1; mode=block
webapp.dir=/usr/lib/ambari-server/web
bootstrap.dir=/var/run/ambari-server/bootstrap
#jdk1.7.home=/usr/jdk64/
jdk1.7.home=/usr/java/jdk1.8.0_92/
jdk1.8.url=http://public-repo-1.hortonworks.com/ARTIFACTS/jdk-8u60-linux-x64.tar.gz
#jdk1.8.home=/usr/jdk64/
jdk1.8.home=/usr/java/jdk1.8.0_92/
user.inactivity.timeout.role.readonly.default=0
http.x-xss-protection=1; mode=block
jce.name=jce_policy-8.zip
client.threadpool.size.max=25
jdk1.7.jcpol-url=http://public-repo-1.hortonworks.com/ARTIFACTS/UnlimitedJCEPolicyJDK7.zip

server.jdbc.user.passwd=/etc/ambari-server/conf/password.dat
server.execution.scheduler.isClustered=false
server.stages.parallel=true
bootstrap.setup_agent.script=/usr/lib/python2.6/site-packages/ambari_server/setupAgent.py
server.jdbc.database=postgres
server.jdbc.database_name=ambari

Solution

  • I have found and solved the problem according to the discussion in https://community.hortonworks.com/questions/23409/there-is-a-problem-when-install-hdp-on-the-stepcon.html

    I think this is because I set a non-English language (i.e,. Trad. Chinese) as a default language when I installed CentOs 7. It would encounter a charset problem (UTF-8<->ascii) when confirming hosts. After changing the default language to English, This problem has been solved.