Search code examples
postgresqlhortonworks-data-platformambari

Ambari server doesn't restart after removing node with cloudbreak


After adding a node to test scaling then removing that node with cloudbreak, the service ambari-server won't restart.

The error at launch is:

DB configs consistency check failed. Run "ambari-server start --skip-database-check" to skip. You may try --auto-fix-database flag to attempt to fix issues automatically. If you use this "--skip-database-check" option, do not make any changes to your cluster topology or perform a cluster upgrade until you correct the database consistency issues. See /var/log/ambari-server/ambari-server-check-database.log for more details on the consistency issues.

Looking the logs doesn't say much more. I tried restarting postgres, sometimes it works, like 1 on 10 times (HOW is it possible ?)


Solution

  • I went deeper in my reasonning rather than just restarting postgres.

    I opened the ambari table to look in it:

    sudo su - postgres psql ambari -U ambari -W -p 5432 (password is bigdata)

    and when I looked in tables topology_logical_request, topology_request and topology_hostgroup, I saw that the cluster register a remove request, only an adding request:

    ambari=> select * from topology_logical_request;
     id | request_id |                        description
    ----+------------+-----------------------------------------------------------
      1 |          1 | Logical Request: Provision Cluster 'sentelab-perf'
     62 |         51 | Logical Request: Scale Cluster 'sentelab-perf' (+1 hosts)
    

    Check the ids to delete (track all requests with adding node operation) and begin to delete them (order matters):

    delete from topology_hostgroup where id = 51;
    delete from topology_logical_request where id = 62;
    DELETE FROM topology_request WHERE id = 51;
    

    close with \q, restart ambari-server, and it works !