Search code examples

Nova compute and network is unable to contact nova service after restart manage services

I have 2 node setup for openstack.

1st node contains manage service like nova-api, nova-scheduler, 'glance` ... 2nd node contains network and compute services.

When I check nova-manage service list all service are showing up.

When I restart the manage node (node 1) compute is disconnected.

When compute try to connect manage node its shows error in compute log.

2013-01-21 20:49:28 TRACE nova.manager Traceback (most recent call last):
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib/python2.6/site-packages/nova/", line 155, in periodic_tasks
2013-01-21 20:49:28 TRACE nova.manager     task(self, context)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib/python2.6/site-packages/nova/compute/", line 2244, in _heal_instance_info_cache
2013-01-21 20:49:28 TRACE nova.manager     context,
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib/python2.6/site-packages/nova/db/", line 594, in instance_get_all_by_host
2013-01-21 20:49:28 TRACE nova.manager     return IMPL.instance_get_all_by_host(context, host)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib/python2.6/site-packages/nova/db/sqlalchemy/", line 103, in wrapper
2013-01-21 20:49:28 TRACE nova.manager     return f(*args, **kwargs)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib/python2.6/site-packages/nova/db/sqlalchemy/", line 1582, in instance_get_all_by_host
2013-01-21 20:49:28 TRACE nova.manager     return _instance_get_all_query(context).filter_by(host=host).all()
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/orm/", line 1922, in all
2013-01-21 20:49:28 TRACE nova.manager     return list(self)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/orm/", line 2032, in __iter__
2013-01-21 20:49:28 TRACE nova.manager     return self._execute_and_instances(context)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/orm/", line 2047, in _execute_and_instances
2013-01-21 20:49:28 TRACE nova.manager     result = conn.execute(querycontext.statement, self._params)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/", line 1399, in execute
2013-01-21 20:49:28 TRACE nova.manager     params)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/", line 1532, in _execute_clauseelement
2013-01-21 20:49:28 TRACE nova.manager     compiled_sql, distilled_params
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/", line 1640, in _execute_context
2013-01-21 20:49:28 TRACE nova.manager     context)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/", line 1633, in _execute_context
2013-01-21 20:49:28 TRACE nova.manager     context)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/", line 330, in do_execute
2013-01-21 20:49:28 TRACE nova.manager     cursor.execute(statement, parameters)
2013-01-21 20:49:28 TRACE nova.manager OperationalError: (OperationalError) socket not open

When I restart the compute and network service it solve the problem. But until i restart the compute or network its gives error.

When I check on compute for the socket open for controler.

[root@compute ~]# ps -ef | grep compute
nova     30859     1 27 18:51 ?        00:00:03 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova.conf --logfile /var/log/nova/compute.log
root     30996 30807  0 18:51 pts/0    00:00:00 grep compute

[root@compute ~]# netstat -p | grep 30859
tcp        0      0 compute:56988        controller:postgres     ESTABLISHED 30859/python
tcp        0      0 compute:37869        controller:amqps        ESTABLISHED 30859/python
tcp        0      0 compute:37871        controller:amqps        ESTABLISHED 30859/python
unix  3      [ ]         STREAM     CONNECTED     3588759 30859/python

There are 2 socket open for controller. postgres and amqps. When I run reboot now on controller and check how many socket available for controller.

[root@compute ~]# netstat -p | grep 30859
tcp      208      0 compute:56988        controller:postgres     CLOSE_WAIT  30859/python
unix  3      [ ]         STREAM     CONNECTED     3590103 30859/python
unix  3      [ ]         STREAM     CONNECTED     3588759 30859/python

In this postgres socket is close.

When all service come up in controller. I run the same command to check the socket connected to controller. I got same result.

Why compute not create new socket for postgres?


  • The socket error that you're getting is from nova-compute attempting to contact the database you have configured in nova.conf, as Matt Joyce pointed out above. Earlier in the log, you can see all the values that the service is configured with. Look for the string "Full set of FLAGS" - that will at least hint towards what was configured in there - it hides the actual value of "sql_connection" from the log output (since it typically has a password embedded in it), but it might help to explain what's happening there.

    From what I'm reading of your question, the nova-compute log files shows this error until your restart the service. Do I read correctly that it works after that?

    Assuming that's correct, is there something that is configuring nova after the base packaged are installed? A run of chef, puppet or the like thats adding configuration details after the service might have started up with an incorrect configuration?