Search code examples
artifactoryamazon-ecs

Single Node Artifactory - deploy using AWS ECS fails with current node still available


Maybe Im just approaching this wrong.

Single Instance mode (non-HA)

AWS-RDS Postgres Database

Deploying via ECS

Currently have Artifactory-Pro building a docker container and deploying to ECS via CI/CD. The initial deploy goes fine. Everything stands up, database migrations occur, and the instance runs.

However, when doing an update to the task, a new task spins up. It then adds entries to the access_topology with the new container-ip and unique node-id, but they stay unhealthy. The logs just then bomb out with failure messages (below - due to existing heartbeat of other node).

If I first stop the running task, and start a new task, it spins up properly (Probably due to heartbeat loss).

In typical ECS world, the new task is spun up till its deemed healthy, and then the older task is killed off.

Either scenario creates orphaned NODE records that stay healthy -- trying to also figure out how to garbage collect on those and purge.

Any thoughts on this?

Errors are below – it appears that it wont properly join because of an active heartbeat, and not being HA. However, I want this node to stand up so I can topple the other. Thanks –

Cluster join: Successfully joined jfmd@01es5dmfhar6gcy5abyj4rwpkc with node id ip-10-10-3-248.us-XXXX-1.compute.internal

Application could not be initialized: Current Artifactory node last heartbeat is: 1607609142483. Stopping Artifactory since the local server is running as PRO/OSS but found other servers in registry

Error occurred when refreshing domain cache all domain endpoint failed : Fetch domains from http://localhost:8046/distribution/api/v1/events/domains failed (returned 404), Fetch domains from http://localhost:8046/artifactory/api/events/domains failed (returned 404), [domain_client]"

Retry 20 Elapsed 16.84 secs failed: Couldn't access another access peer. [localhost:8046]. Status code: UNAVAILABLE. HTTP status code 503
Status code: UNAVAILABLE. HTTP status code 503
1607609184634,invalid content-type: text/plain; charset=utf-8
1607609184634,"headers: Metadata(:status=503,content-type=text/plain; charset=utf-8,content-length=19,date=Thu, 10 Dec 2020 14:06:24 GMT)"
1607609184634,DATA-----------------------------
1607609184634,Service Unavailable. Trying again


Solution

  • This is not possible without an HA configuration. Since this is not an HA configuration, the application will not start up if there is another application still "alive". In this case, "alive" is defined as having written the heartbeat within X amount of seconds (I believe this is 10 by default).