Search code examples
cloudera-cdhdata-recoverycloudera-manager

CDH4 Restore Cloudera Manager to existing cluster


Our Cloudera Manager (4.7) node on production had gone awry so we installed a fresh OS on that node. We are trying to recover Cloudera Manager from backups we have of the (embedded) postgresql db. We hope by using the restored DB, CM can manage the existing cluster with the existing configurations.

We are doing a few POCs in which we are trying to port the cloudera manager to a new server with the steps outlined as below. (Eventually we will install CM on the same node)

  1. install cloudera-server-daemons cloudera-server
  2. install cloudera-server-db
  3. sudo service cloudera-server-db start => this creates the basic roles ; regenerates passwords etc.
  4. so from our pg_dumpall foo.sql we removed the initial statements which created the roles and passwords and the database. pql -U cloudera-scm -h localhost -p 7432 -f foo.sql postgres .This completed successfully.
  5. On each on node in the cluster change the /etc/cloudera-scm-agent/config.ini to point to the new node
  6. sudo service cloudera-server start . => we were expecting the CM to pick up the configs and just load up. However it takes us the installer page
  7. Install free edition. Either search for ips or we see the hosts available.
  8. Next it updates the cdh packages on each node in the cluster and asks us for installation of services.
  9. After this the process is a little unclear. However we did manage to assign roles to the appropriate nodes for eg. HDFS using the same root dir it was not formatted and everything seems ok. However all our configuration is missing. This seems to suggest that the CM did not read off the restored DB.

The above steps do not seem to be the right way of restoring the state of the cloudera manager. This Reference possibly lists a seamless way to do this. By following the steps mentioned in the link we still cannot get CM to read off the restored DB. Can someone point to the right steps please ? Any help appreciated.


Solution

  • After lots of poc's we came to the conclusion that the db dump was useless. Fortunately for us we had the /data directory for the postgresql.

    We chose the same machine for the re-install (so no need to mess around with hostnames and ipaddresses in the /etc/cloudera-scm-agent/config.ini) So we installed the correct postgresql version, cloudera-scm-server, cloudera-scm-server-db, cloudera-scm-agent, cloudera-scm-daemons and their associated dependencies.

    One issue we had is that we had lost the db.mgmt.properties. We were able to alter the passwords of the users (amon, hmon, smon, nav etc.). The logic for the password is md5(yourPasswordUser) using the md5 function available in postgres. In addition you need to prepend this password with 'md5'.

    Boot up cloudera-scm-server and all the services will show up. If there are database connection problems then go to the associated service for e.g. activity monitor and change the password to yourPassword. restart.

    This worked for us. We did not need to install or reconfigure services.