I have a launch script (user data) that runs on startup in aws with an ubuntu 16.04 image, and the issue I'm having is that when it gets to the part where it runs an ansible playbook the playbook fails saying this basic error message Could not get lock /var/lib/dpkg/lock
. Now when I log in and try to run the ansible script manually it works, but if I run it from the aws user data, it fails with the error.
This is the full error
TASK [rabbitmq : install packages (Ubuntu default repo is used)] ***************
task path: /etc/ansible/roles/rabbitmq/tasks/main.yml:50
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: root
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1480352390.01-116502531862586 `" && echo ansible-tmp-1480352390.01-116502531862586="` echo $HOME/.ansible/tmp/ansible-tmp-1480352390.01-116502531862586 `" ) && sleep 0'
<localhost> PUT /tmp/tmpGHaVRP TO /.ansible/tmp/ansible-tmp-1480352390.01-116502531862586/apt
<localhost> EXEC /bin/sh -c 'chmod u+x /.ansible/tmp/ansible-tmp-1480352390.01-116502531862586/ /.ansible/tmp/ansible-tmp-1480352390.01-116502531862586/apt && sleep 0'
<localhost> EXEC /bin/sh -c 'LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /.ansible/tmp/ansible-tmp-1480352390.01-116502531862586/apt; rm -rf "/.ansible/tmp/ansible-tmp-1480352390.01-116502531862586/" > /dev/null 2>&1 && sleep 0'
fatal: [localhost]: FAILED! => {"cache_update_time": 0, "cache_updated":
false, "changed": false, "failed": true, "invocation": {"module_args":
{"allow_unauthenticated": false, "autoremove": false, "cache_valid_time":
null, "deb": null, "default_release": null, "dpkg_options": "force-
confdef,force-confold", "force": false, "install_recommends": null, "name":
"rabbitmq-server", "only_upgrade": false, "package": ["rabbitmq-server"],
"purge": false, "state": "present", "update_cache": false, "upgrade": null},
"module_name": "apt"}, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--
force-confdef\" -o \"Dpkg::Options::=--force-confold\" install
'rabbitmq-server'' failed: E: Could not get lock /var/lib/dpkg/lock - open
(11: Resource temporarily unavailable)\nE: Unable to lock the administration
directory (/var/lib/dpkg/), is another process using it?\n", "stderr": "E: Could
not get lock /var/lib/dpkg/lock - open (11: Resource temporarily
unavailable)\nE: Unable to lock the administration directory (/var/lib/dpkg/),
is another process using it?\n", "stdout": "", "stdout_lines": []}
I ran into the same lock issue. I found that ubuntu was installing some packages on first boot which cloud-init did not wait for.
I use the following script to check that the lock file is available for at least 15 seconds prior to trying to install anything.
#!/bin/bash
i="0"
while [ $i -lt 15 ]
do
if [ $(fuser /var/lib/dpkg/lock) ]; then
i="0"
fi
sleep 1
i=$[$i+1]
done
The reason I prefer this vs sleep 5m
because in an autoscale group the instance may be removed before it's even provisioned.