Search code examples
jenkinsjenkins-pluginsoracle-cloud-infrastructurejenkins-agent

Jenkins agent Nodes on OCI Won't Deploy


I am trying to deploy Jenkins with the Jenkins controller and the agent nodes in Oracle Cloud Infrastructure (OCI). I am following step by step instructions and videos from Oracle University. However, it fails to deploy agent nodes. The following link looks very similar to the instructions I followed:

https://reachmnadeem.wordpress.com/2019/08/22/jenkins-up-and-running-on-oracle-cloud-infrastructure-oci/

After setting up the OCI plugin, I made sure I validated my credentials and Jenkins could login to OCI - it could. I then setup a template and tried to deploy the agent nodes by clicking on: "Build Executor Status" and then "Provision Oracle Cloud Infrastructure Compute".

My template was trying to deploy Linux agent nodes.

It displays the following message: "Started provisioning node oci-compute-8ddc4d29-cad9-46cd-b565-eed7611d6fc5 with 1 executors" - but it never actually deploys any nodes.

I have listed the error message in the Jenkins logs at the end of this post.

The only differences I found between what I did and the instructions I followed is that the instructions seem to be showing an older version of Jenkins (one that had the new cloud settings on the same page as configure system, instead of its own page. And for the template it asked for both private and public ssh keys, but the current version of Jenkins asks only for the private).

Please let me know if you have any ideas of how to troubleshoot or fix this? I was not finding a lot of useful information, when I did searches on this issue.

Logs Below

Provisioning new cloud infrastructure instance
Dec 16, 2020 2:47:57 AM INFO com.oracle.bmc.core.ComputeClient setEndpoint
Setting endpoint to https://iaas.us-phoenix-1.oraclecloud.com
Dec 16, 2020 2:47:57 AM WARNING com.oracle.cloud.baremetal.jenkins.BaremetalCloud$ExplicitProvisioner call
Provisioned slave jenkins-192.168.0.11-8ddc4d29-cad9-46cd-b565-eed7611d6fc5 failed!
java.lang.Exception: Instance creation fails because: null
    at com.oracle.cloud.baremetal.jenkins.client.SDKBaremetalCloudClient.createInstance(SDKBaremetalCloudClient.java:237)
    at com.oracle.cloud.baremetal.jenkins.BaremetalCloud.provision(BaremetalCloud.java:230)
    at com.oracle.cloud.baremetal.jenkins.BaremetalCloud.access$100(BaremetalCloud.java:65)
    at com.oracle.cloud.baremetal.jenkins.BaremetalCloud$Provisioner.call(BaremetalCloud.java:222)
    at com.oracle.cloud.baremetal.jenkins.BaremetalCloud$ExplicitProvisioner.call(BaremetalCloud.java:382)
    at com.oracle.cloud.baremetal.jenkins.BaremetalCloud$ExplicitProvisioner.call(BaremetalCloud.java:372)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

Solution

  • I found the answer, so I am posting it here. It seems the problem is it doesn't support a password protected ssh-key. Once I generated a new key that wasn't password protected, the java null error in the logs went away and it deployed the slave nodes.

    I found this out by first installing Jenkins again on Oracle Linux inside of OCI where the slave nodes would live. The purpose of this test was to see if there was any communication issues between the original master and oci. My original master is NOT in OCI, but i was trying to spin up slaves in OCI. So if the Master was now also in OCI and in the same subnet where the slaves would be, it would be easy to eliminate any communication issues between the two as a possible cause. This test still didn't deploy slave nodes, so that wasn't the cause.

    Then I installed the AWS EC2 Plugin to see if I could get slave nodes to deploy on AWS. I was trying to rule out the OCI plugin as having a bug or compatibility problem with the version of Jenkins I was running. If that didn't work, I was going to downgrade to an older version of Jenkins, as I was using the most recent available for the repo for my Linux distro.

    After installing the AWS-EC2 plugin and trying to deploy slave nodes, it gave me an error on validating my credentials even though I was positive they were correct. I looked into the Jenkins logs and it showed a more detailed answer - it said "password protected private keys not supported". That made me wonder if that was the problem with OCI as well. So I generated a new ssh-Key that wasn't password protected. And slave node creations worked now with both AWS and OCI clouds.

    Just to be sure this was the solution, I repeated the test using the password protected key and it failed again. But this time I did notice that the error logs with OCI also now said it didn't support password protected keys. I went back to the one without a password and again it worked. So that is the answer - don't use password protected ssh keys.

    I wasn't quite sure why I got a more detailed error message in the logs this time. I thought the AWS plugin that I installed must have something to do with it. That wasn't it, because then I installed the AWS plugin on my original master and it still gave the same cryptic error message of Java = Null.

    Then I noticed that the Jenkins version installed on Oracle Linux was Jenkins 2.271 - a more recent version than on my original master (a Debian/Ubuntu distro) - which had Jenkins version 2.263. There wasn't a newer versions of Jenkins available from the Debian/Ubuntu repo I was using. It must be the more recent version of Jenkins that causes the more user friendly error message that then enabled me to figure this out. I am just lucky the repo for Oracle Linux (a redhat based distro) had more recent version of Jenkins or I would have never figured this out.