I add the metadata "metering.server_group":"corey-group"
to an instance while creating, and check it by using nova show
, it is applied, then I check the Gnocchi resource using gnocchi resource show --type instance ${instance-id}
, the attribute server_group
is None
in the begining, but after a while, it will be applied (always on the hour, ex: 07:00, 08:00...), I have no idea what happens, I think this issue will cause Gnocchi gets incorrect datasets while doing aggregation, so I spent some times to troubleshoot it.
First of all, the attributes of Gnocchi resource stored in database:
MariaDB [(none)]> use gnocchi
MariaDB [gnocchi]> select * from resource_type where name='instance';
# check its tablename, ex: rt_xxxxxx
MariaDB [gnocchi]> select * from rt_xxxxxx where display_name='corey-vm';
+----------------+---------------------+-----------+--------------------------------------+-------------------------+------------------+---+
| display_name | host | image_ref | flavor_id | server_group | id | flavor_name |
+----------------+---------------------+-----------+--------------------------------------+-------------------------+------------------+---+
| corey-vm | corey-test-com-001 | NULL | 26e46b4c-23bd-4224-a609-29bd3094a18e | NULL | xxxxxx | corey-flavor |
+----------------+---------------------+-----------+--------------------------------------+-------------------------+------------------+---+
As you can see, the column server_group
should be corey-group
, but it is always NULL
when the instance is just created, and seems like ceilometer updates the resource per hour on the hour.
I added some log in the file ceilometer/publisher/gnocchi.py, and found that it updates resource every minutes, but the variable resource_extra
gets server_group
only on the hour, that's why it is None is the begining.
Here are some parts of the logs
2020-11-09 11:59:15 DEBUG ceilometer.publisher.gnocchi Resource {'host': u'test-com-002', 'display_name': u'vm-001', 'flavor_id': u'xxx', 'flavor_name': u'xxx'} publish_samples /usr/lib/python2.7/site-packages/ceilometer/publisher/gnocchi.py:345
2020-11-09 12:00:15 DEBUG ceilometer.publisher.gnocchi Resource {'host': u'test-com-002', 'display_name': u'vm-001', 'flavor_name': u'xxx', 'server_group': 'corey-group'} publish_samples /usr/lib/python2.7/site-packages/ceilometer/publisher/gnocchi.py:345
2020-11-09 12:01:15 DEBUG ceilometer.publisher.gnocchi Resource {'host': u'test-com-002', 'display_name': u'vm-001', 'flavor_id': u'xxx', 'flavor_name': u'xxx'} publish_samples /usr/lib/python2.7/site-packages/ceilometer/publisher/gnocchi.py:345
But I stuck at this point, I can't understand why the variable resource_extra
can't gets server_group
each time. What causes this happpening exactly? (Running on Queens)
I would appreciate any ideas.
Update 09/11/2020
After some days of troubleshooting, I still can't find the root cause.
But I found a command line to apply the 'server_group' manually, that can help me to avoid Gnocchi gets incorrect datasets to aggregate.
Here it is:
gnocchi resource update --type instance -a server_group:corey-group ${resource_id}
Update 11/11/2020
I tried to grep
the integer 3600 and modify them to 300, but nothing changed, below are what I've tried.
/etc/ceilometer/ceilometer.conf
[compute]
resource_cache_expiry = 300
ceilometer/compute/discovery.py
cfg.IntOpt('resource_cache_expiry',
default=300,
ceilometer/publisher/zaqar.py
DEFAULT_TTL = 300
Update 12/11/2020
I can't reproduce this issue on Pike.
Maybe you can refer to the following discussions:
According to the reference, try to change the default instance_discovery_method
from "libvirt_metadata" to "naive" in ceilometer config file, like this:
[compute]
instance_discovery_method = naive
Switching to "naive" resolves this issue, however it obviously generates load on the Nova API for metadata retrieval.