Search code examples
ansiblevagrantvirtual-machineprovisioning

Ansible is re-provision the same host even though inventory file setup correctly


I've been trying to debug this for a while now, and I thought I had it working, but then made some other changes, and now back again.

Basically, I have Vagrant looping over a list of machines definitions and while my Ansible inventory looks perfectly fine, I find that only one host is actually being provisioned.

Generated Ansible Inventory -- The SSH ports are all different, groups are correct

# Generated by Vagrant

kafka.cp.vagrant ansible_host=127.0.0.1 ansible_port=2200 ansible_user='vagrant' ansible_ssh_private_key_file='/workspace/confluent/cp-ansible/vagrant/.vagrant/machines/kafka.cp.vagrant/virtualbox/private_key' kafka='{"broker": {"id": 1}}'
zk.cp.vagrant ansible_host=127.0.0.1 ansible_port=2222 ansible_user='vagrant' ansible_ssh_private_key_file='/workspace/confluent/cp-ansible/vagrant/.vagrant/machines/zk.cp.vagrant/virtualbox/private_key'
connect.cp.vagrant ansible_host=127.0.0.1 ansible_port=2201 ansible_user='vagrant' ansible_ssh_private_key_file='/workspace/confluent/cp-ansible/vagrant/.vagrant/machines/connect.cp.vagrant/virtualbox/private_key'

[preflight]
zk.cp.vagrant
kafka.cp.vagrant
connect.cp.vagrant

[zookeeper]
zk.cp.vagrant

[broker]
kafka.cp.vagrant

[schema-registry]
kafka.cp.vagrant

[connect-distributed]
connect.cp.vagrant

Generated hosts file -- IPs and hostnames are correct

## vagrant-hostmanager-start id: aca1499c-a63f-4747-b39e-0e71ae289576
192.168.100.101 zk.cp.vagrant

192.168.100.102 kafka.cp.vagrant

192.168.100.103 connect.cp.vagrant

## vagrant-hostmanager-end

Ansible Playbook I want to run -- Correctly correspond to the groups in my inventory

- hosts: preflight
  tasks:
  - import_role:
      name: confluent.preflight
- hosts: zookeeper
  tasks:
  - import_role:
      name: confluent.zookeeper
- hosts: broker
  tasks:
  - import_role:
      name: confluent.kafka-broker
- hosts: schema-registry
  tasks:
  - import_role:
      name: confluent.schema-registry
- hosts: connect-distributed
  tasks:
  - import_role:
      name: confluent.connect-distributed

For any code missing here, see Confluent :: cp-ansible.

The following is a sample of my Vagrantfile. (I made a fork, but haven't committed until I get this working...)

I know that this if index == machines.length - 1 should work according to the Vagrant documentation, and it does start all the machines, then only runs Ansible on the last machine, but its just all the tasks are executed on first one for some reason.

machines = {"zk"=>{"ports"=>{2181=>nil}, "groups"=>["preflight", "zookeeper"]}, "kafka"=>{"memory"=>3072, "cpus"=>2, "ports"=>{9092=>nil, 8081=>nil}, "groups"=>["preflight", "broker", "schema-registry"], "vars"=>{"kafka"=>"{\"broker\": {\"id\": 1}}"}}, "connect"=>{"ports"=>{8083=>nil}, "groups"=>["preflight", "connect-distributed"]}}

Vagrant.configure("2") do |config|

  if Vagrant.has_plugin?("vagrant-hostmanager")
    config.hostmanager.enabled = true
    config.hostmanager.manage_host = true
    config.hostmanager.ignore_private_ip = false
    config.hostmanager.include_offline = true
  end

  # More info on http://fgrehm.viewdocs.io/vagrant-cachier/usage
  if Vagrant.has_plugin?("vagrant-cachier")
    config.cache.scope = :box
  end

  if Vagrant.has_plugin?("vagrant-vbguest")
    config.vbguest.auto_update = false
  end

  config.vm.box = VAGRANT_BOX
  config.vm.box_check_update = false
  config.vm.synced_folder '.', '/vagrant', disabled: true

  machines.each_with_index do |(machine, machine_conf), index|
    hostname = getFqdn(machine.to_s)

    config.vm.define hostname do |v|
      v.vm.network "private_network", ip: "192.168.100.#{101+index}"
      v.vm.hostname = hostname

      machine_conf['ports'].each do |guest_port, host_port|
        if host_port.nil?
          host_port = guest_port
        end
        v.vm.network "forwarded_port", guest: guest_port, host: host_port
      end

      v.vm.provider "virtualbox" do |vb|
        vb.memory = machine_conf['memory'] || 1536 # Give overhead for 1G default java heaps
        vb.cpus = machine_conf['cpus'] || 1
      end

      if index == machines.length - 1
        v.vm.provision "ansible" do |ansible|
          ansible.compatibility_mode = '2.0'
          ansible.limit = 'all'
          ansible.playbook = "../plaintext/all.yml"
          ansible.become = true
          ansible.verbose = "vv"

          # ... defined host and group variables here

        end # Ansible provisioner
      end # If last machine
    end # machine configuration
  end # for each machine
end 

I setup an Ansible task like this

- debug:
    msg: "FQDN: {{ansible_fqdn}}; Hostname: {{inventory_hostname}}; IPv4: {{ansible_default_ipv4.address}}"

Just with that task, notice that the following ansible_fqdn is always zk.cp.vagrant, and this lines up with the fact that only that VM is getting provisioned by Ansible.

ok: [zk.cp.vagrant] => {
    "msg": "FQDN: zk.cp.vagrant; Hostname: zk.cp.vagrant; IPv4: 10.0.2.15"
}
ok: [kafka.cp.vagrant] => {
    "msg": "FQDN: zk.cp.vagrant; Hostname: kafka.cp.vagrant; IPv4: 10.0.2.15"
}
ok: [connect.cp.vagrant] => {
    "msg": "FQDN: zk.cp.vagrant; Hostname: connect.cp.vagrant; IPv4: 10.0.2.15"
}

Update with minimal example: hostname -f is only one host, and I assume that's what gather_facts is running for ansible_fqdn

ansible all --private-key=~/.vagrant.d/insecure_private_key --inventory-file=/workspace/confluent/cp-ansible/vagrant/.vagrant/provisioners/ansible/inventory -a 'hostname -f' -f1

zk.cp.vagrant | SUCCESS | rc=0 >>
kafka.cp.vagrant

connect.cp.vagrant | SUCCESS | rc=0 >>
kafka.cp.vagrant

kafka.cp.vagrant | SUCCESS | rc=0 >>
kafka.cp.vagrant

Solution

  • Turns out I can get around the problem with not having this section in my ansible.cfg

    [ssh_connection]
    control_path = %(directory)s/%%h-%%r