I am simply trying to install Cloudwatch Agent on Amazon Linux 2 instances at startup, using AWS userdata. For some reason, after Cloud Init has finished running, all services get restarted and the configuration file I put in the cloudwatch folder is not there anymore.
I am using a custom AMI which is pre-built with Packer, my configuration file being put in /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json
from an Ansible template. This is the configuration file I want to use, holding all metrics and logs I want to send. I am then copying it to /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
at startup after the agent installation.
Here is my userdata script:
#!/bin/bash
yum install amazon-cloudwatch-agent -y
cp /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
After startup has finished, I can see the script ran correctly. If I run cat /opt/aws/amazon-cloudwatch-agent/log/amazon-cloudwatch-agent.log
I can see that the following:
2021/07/16 13:33:46 I! I! Detected the instance is EC2
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
Configuration validation first phase succeeded
2021/07/16 13:33:46 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 I! Detected runAsUser: root
2021/07/16 13:33:46 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to root:root
2021-07-16T13:33:46Z I! Starting AmazonCloudWatchAgent 1.247347.4
2021-07-16T13:33:46Z I! Loaded inputs: netstat diskio logfile mem net processes swap cpu disk
2021-07-16T13:33:46Z I! Loaded aggregators:
2021-07-16T13:33:46Z I! Loaded processors: delta ec2tagger
2021-07-16T13:33:46Z I! Loaded outputs: cloudwatch cloudwatchlogs
2021-07-16T13:33:46Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:46Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:46Z I! [logagent] starting
2021-07-16T13:33:46Z I! [logagent] found plugin cloudwatchlogs is a log backend
2021-07-16T13:33:46Z I! [logagent] found plugin logfile is a log collection
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:46Z I! cloudwatch: get unique roll up list [[AutoScalingGroupName] [InstanceId InstanceType] []]
2021-07-16T13:33:46Z I! cloudwatch: publish with ForceFlushInterval: 30s, Publish Jitter: 11s
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
=======> 2021-07-16T13:33:47Z I! [logagent] piping log from APP-DEV-php-errors-logs/XX.XX.X.XXX(/var/log/php-fpm/error.log) to cloudwatchlogs
2021-07-16T13:33:54Z I! Profiler is stopped during shutdown
2021-07-16T13:33:54Z I! [agent] Hang on, flushing any cached metrics before shutdown
2021/07/16 13:33:55 I! I! Detected the instance is EC2
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded
2021/07/16 13:33:55 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
2021/07/16 13:33:55 I! Detected runAsUser: cwagent
2021/07/16 13:33:55 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 994:992
2021/07/16 13:33:55 I! Set HOME: /home/cwagent
2021-07-16T13:33:55Z I! Starting AmazonCloudWatchAgent 1.247348.0
2021-07-16T13:33:55Z I! Loaded inputs: disk mem
2021-07-16T13:33:55Z I! Loaded aggregators:
2021-07-16T13:33:55Z I! Loaded processors: ec2tagger
2021-07-16T13:33:55Z I! Loaded outputs: cloudwatch
2021-07-16T13:33:55Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:55Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:55Z I! [logagent] starting
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:55Z I! cloudwatch: get unique roll up list []
2021-07-16T13:33:55Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 26s
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2021-07-16T13:39:07Z I! [processors.ec2tagger] ec2tagger: Refresh is no longer needed, stop refreshTicker.
So as you can see, the initial command from userdata runs fine and custom metrics and logs are collected (see the ====> mark before the relevant lines).
However a few seconds later, after Cloud Init is over, the cloudwatch agent is restarted by systemd somehow and again, somehow, the file amazon-cloudwatch-agent.json
is absent from the filesystem, so the agent runs with default parameters.
However if I rerun the command manually after startup everything works fine but of course I need it automated for when autoscaling fires up.
Launching amazon cloudwatch agent directly with systemd, trying to chown the config file to read-only, fetching config only and let the system start the agent itself, but the problem still persists.
Thank you for your help
The preinstalled ssm-agent conflicts with the Cloudwtach Agent. Uninstall ssm-agent during Packer build:
sudo yum erase amazon-ssm-agent --assumeyes
I finally found out that the newly install cloudwatch agent conflicts with the SSM agent installed by default in the Amazon Linux 2 image. Indeed, I first tried an ugly workaround which would be to replace the StartExec line of the amazon-cloudwatch-agent service using sed in the user data :
sed -i '/ExecStart/c\ExecStart=/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json' /etc/systemd/system/amazon-cloudwatch-agent.service
That way when the service gets restarted after instance startup it would use my custom configuration. However I then found out that the service file got also replaced after Cloud Init ended.
Reviewing the system messages I noticed that ssm-agent was performing some configuration reloading after Cloud Init ended, and thus I assumed that it could possibly be the culprit. I ended up uninstalling it in the packer build which is building my AMI so it would not be present at instance startup, and finally my configuration did not get overwritten anymore.
Note that I do not have a deep understanding of how ssm-agent works, and there is probably a proper way to instantiate Cloudwatch Agent using some SSM configuration. Since we do not currently use SSM and I do not have enough time to study this option, I choosed this compromise.
If someone can come up with a cleaner solution, using ssm-agent through an automated method, this would be greatly appreciated.