I have 2 node setup for openstack.
1st node contains manage service like nova-api
, nova-scheduler
, 'glance` ...
2nd node contains network and compute services.
When I check nova-manage service list
all service are showing up.
When I restart the manage node (node 1) compute is disconnected.
When compute try to connect manage node its shows error in compute log.
2013-01-21 20:49:28 TRACE nova.manager Traceback (most recent call last):
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib/python2.6/site-packages/nova/manager.py", line 155, in periodic_tasks
2013-01-21 20:49:28 TRACE nova.manager task(self, context)
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 2244, in _heal_instance_info_cache
2013-01-21 20:49:28 TRACE nova.manager context, self.host)
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib/python2.6/site-packages/nova/db/api.py", line 594, in instance_get_all_by_host
2013-01-21 20:49:28 TRACE nova.manager return IMPL.instance_get_all_by_host(context, host)
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib/python2.6/site-packages/nova/db/sqlalchemy/api.py", line 103, in wrapper
2013-01-21 20:49:28 TRACE nova.manager return f(*args, **kwargs)
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib/python2.6/site-packages/nova/db/sqlalchemy/api.py", line 1582, in instance_get_all_by_host
2013-01-21 20:49:28 TRACE nova.manager return _instance_get_all_query(context).filter_by(host=host).all()
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/orm/query.py", line 1922, in all
2013-01-21 20:49:28 TRACE nova.manager return list(self)
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/orm/query.py", line 2032, in __iter__
2013-01-21 20:49:28 TRACE nova.manager return self._execute_and_instances(context)
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/orm/query.py", line 2047, in _execute_and_instances
2013-01-21 20:49:28 TRACE nova.manager result = conn.execute(querycontext.statement, self._params)
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1399, in execute
2013-01-21 20:49:28 TRACE nova.manager params)
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1532, in _execute_clauseelement
2013-01-21 20:49:28 TRACE nova.manager compiled_sql, distilled_params
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1640, in _execute_context
2013-01-21 20:49:28 TRACE nova.manager context)
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1633, in _execute_context
2013-01-21 20:49:28 TRACE nova.manager context)
2013-01-21 20:49:28 TRACE nova.manager File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/default.py", line 330, in do_execute
2013-01-21 20:49:28 TRACE nova.manager cursor.execute(statement, parameters)
2013-01-21 20:49:28 TRACE nova.manager OperationalError: (OperationalError) socket not open
When I restart the compute and network service it solve the problem. But until i restart the compute or network its gives error.
When I check on compute for the socket open for controler.
[root@compute ~]# ps -ef | grep compute
nova 30859 1 27 18:51 ? 00:00:03 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova.conf --logfile /var/log/nova/compute.log
root 30996 30807 0 18:51 pts/0 00:00:00 grep compute
[root@compute ~]# netstat -p | grep 30859
tcp 0 0 compute:56988 controller:postgres ESTABLISHED 30859/python
tcp 0 0 compute:37869 controller:amqps ESTABLISHED 30859/python
tcp 0 0 compute:37871 controller:amqps ESTABLISHED 30859/python
unix 3 [ ] STREAM CONNECTED 3588759 30859/python
There are 2 socket open for controller. postgres
and amqps
.
When I run reboot now
on controller and check how many socket available for controller.
[root@compute ~]# netstat -p | grep 30859
tcp 208 0 compute:56988 controller:postgres CLOSE_WAIT 30859/python
unix 3 [ ] STREAM CONNECTED 3590103 30859/python
unix 3 [ ] STREAM CONNECTED 3588759 30859/python
In this postgres
socket is close.
When all service come up in controller. I run the same command to check the socket connected to controller. I got same result.
Why compute not create new socket for postgres
?
The socket error that you're getting is from nova-compute attempting to contact the database you have configured in nova.conf, as Matt Joyce pointed out above. Earlier in the log, you can see all the values that the service is configured with. Look for the string "Full set of FLAGS" - that will at least hint towards what was configured in there - it hides the actual value of "sql_connection" from the log output (since it typically has a password embedded in it), but it might help to explain what's happening there.
From what I'm reading of your question, the nova-compute log files shows this error until your restart the service. Do I read correctly that it works after that?
Assuming that's correct, is there something that is configuring nova after the base packaged are installed? A run of chef, puppet or the like thats adding configuration details after the service might have started up with an incorrect configuration?