Search code examples
nomadwaypoint

Nomad and Waypoint cannot launch more than 2 jobs


I'am currently tryng to deploy several db on Nomad cluster. Test - dev - qa - ppd I'am using waypoint with var files to automatise deploy. I have strange issue, I cannot launch more than 2 db job, when I launch new db job the older 2 jobs disappear and replaced by new db job launched previously.

Waypoint file

# waypoint up -var-file=/opt/waypoint/xx/xx-api/dev/dev.wpvars
project = "xx-db"

# An application to deploy.
app "xx-db" {
    build {
        use "docker" {
            dockerfile = "${path.app}/${var.dockerfile_path}"
        }
        
        
        # Uncomment below to use a remote docker registry to push your built images.
        #
         registry {
           use "docker" {
             #image = "registry.example.com/image"
             image =  "${var.registry_path}/xx-db-${var.env}"
             tag   = "${var.version}"
           }
         }

    }



 # Deploy to Docker
    deploy {
           use "nomad-jobspec" {
      jobspec = templatefile("${path.app}/finess-db.hcl", {
    datacenter = var.datacenter
  env = var.env

          })
    }
    }
}




variable env {
    type = string
    default = ""
}

variable dockerfile_path {
    type = string
    default = "Dockerfile"
}

variable "registry_path" {
    type = string
    default = "registry.repo.proxy-xx-xx.xx.xx.xx.net"
               
}

variable datacenter {
    type = string
    default = "xx"
}

variable "version" {
  type    = string
  default = gitrefpretty()
  env     = ["gitrefpretty()"]
               
}

2 job

After new job the older test and formation disapper new job

job "xxx-psqldb-${env}" {
        datacenters = ["xxx"]
        type = "service"
          vault {
          policies = ["xxx"]
          change_mode = "noop"
          }
        update {
                stagger = "30s"
                max_parallel = 1
        }

        group "xxx-psqldb-${env}" {
                count = "1"
                restart {
                        attempts = 3
                        delay = "60s"
                        interval = "1h"
                        mode = "fail"
                }
                network {
                        mode = "host"
                        port "pgsqldb" { to = 5432 }
                }
                task "xxx-psqldb-${env}" {
                        driver = "docker"
                        config {
                                image = "${artifact.image}:${artifact.tag}"
                                ports = [
                                        "pgsqldb"
                                        ]
                                volumes = [
                                    "name=xxxpsqldb${env},io_priority=high,size=5,repl=1:/var/lib/postgresql/data"
                                ]
                                volume_driver = "pxd"

                        }
                        template {
                                data = <<EOH
POSTGRES_USER="{{ with secret "app/xxx/db/admin" }}{{ .Data.data.user }}{{end}}"
POSTGRES_PASSWORD="{{ with secret "app/xxx/db/admin" }}{{ .Data.data.password }}{{end}}"

EOH
                                destination = "secrets/db"
                                env = true
                        }
                        resources {
                                cpu = 256
                                memory = 256
                        }
                        service {
                                name = "xxx-psql-svc-${env}"
                                tags = ["urlprefix-xxx-psql-${env} proto=tcp"]
                                port = "pgsqldb"
                                 check {
                                         name         = "alive"
                                         type         = "tcp"
                                         interval     = "10s"
                                         timeout      = "5s"
                                         port         = "pgsqldb"
                                }

                        }

                }
        }
}

I have this same issue when I launch other job for front app or back app.

Should I configure something in the cluster ?

Thx for help


Solution

  • Ive encountered similar issue.

    TLDR: use -prune=false

    Explanation

    As the waypoint docs mention:

    ... if -prune=false is not set, Waypoint may delete your job via "pruning" a previous version

    Furthermore,that currently locks you to using CLI

    CLI flags are the only way to customize this today

    As described here

    The issue can be also found on hashicorp disuss.