Search code examples
azurecloudreverse-proxyazure-service-fabrictraefik

Traefik with Service Fabric -- failed to connect to Service Fabric server


I have deployed Traefik on my Azure Service Fabric cluster with the following configuration:

# Enable Service Fabric configuration backend
[servicefabric]

# Service Fabric Management Endpoint
clustermanagementurl = "https://localhost:19080"

# Service Fabric Management Endpoint API Version
apiversion = "3.0"

insecureSkipVerify = true

However, when opening the Traefik dashboard, I'm getting an empty screen, as it fails to map all my Fabric applications.

Looking at the Traefik logs on one of my VMs, I'm seeing this error repeatedly:

level=error msg="failed to connect to Service Fabric server Get https://localhost:19080/Applications/?api-version=3.0: x509: certificate is valid for <hidden>.eastus.cloudapp.azure.com, not localhost on https://localhost:19080/Applications/?api-version=3.0"

My Azure Service Fabric cluster has a SSL certificate signed by a trusted CA: Service Fabric management portal

How can I solve this issue?


Edit 1:

If it helps, this is the configuration Traefik loads (according to the logs):

{
    "LifeCycle": {
        "RequestAcceptGraceTimeout": 0,
        "GraceTimeOut": 0
    },
    "GraceTimeOut": 0,
    "Debug": true,
    "CheckNewVersion": true,
    "AccessLogsFile": "",
    "AccessLog": null,
    "TraefikLogsFile": "",
    "TraefikLog": null,
    "LogLevel": "DEBUG",
    "EntryPoints": {
        "http": {
            "Network": "",
            "Address": ":80",
            "TLS": null,
            "Redirect": null,
            "Auth": null,
            "WhitelistSourceRange": null,
            "Compress": false,
            "ProxyProtocol": null,
            "ForwardedHeaders": {
                "Insecure": true,
                "TrustedIPs": null
            }
        }
    },
    "Cluster": null,
    "Constraints": [],
    "ACME": null,
    "DefaultEntryPoints": [
        "http"
    ],
    "ProvidersThrottleDuration": 2000000000,
    "MaxIdleConnsPerHost": 200,
    "IdleTimeout": 0,
    "InsecureSkipVerify": true,
    "RootCAs": null,
    "Retry": null,
    "HealthCheck": {
        "Interval": 30000000000
    },
    "RespondingTimeouts": null,
    "ForwardingTimeouts": null,
    "Docker": null,
    "File": null,
    "Web": {
        "Address": ":9000",
        "CertFile": "",
        "KeyFile": "",
        "ReadOnly": false,
        "Statistics": null,
        "Metrics": null,
        "Path": "/",
        "Auth": null,
        "Debug": false,
        "CurrentConfigurations": null,
        "Stats": null,
        "StatsRecorder": null
    },
    "Marathon": null,
    "Consul": null,
    "ConsulCatalog": null,
    "Etcd": null,
    "Zookeeper": null,
    "Boltdb": null,
    "Kubernetes": null,
    "Mesos": null,
    "Eureka": null,
    "ECS": null,
    "Rancher": null,
    "DynamoDB": null,
    "ServiceFabric": {
        "Watch": false,
        "Filename": "",
        "Constraints": null,
        "Trace": false,
        "DebugLogGeneratedTemplate": false,
        "ClusterManagementURL": "https://localhost:19080",
        "APIVersion": "3.0",
        "UseCertificateAuth": false,
        "ClientCertFilePath": "",
        "ClientCertKeyFilePath": "",
        "InsecureSkipVerify": true
    }
}

Edit 2:

One suggested to use the remote address of my cluster instead of localhost, doing so results in a different error:

Provider connection error: failed to connect to Service Fabric server Get https://<hidden>.eastus.cloudapp.azure.com:19080/Applications/?api-version=3.0: stream error: stream ID 1; HTTP_1_1_REQUIRED on https://<hidden>.eastus.cloudapp.azure.com:19080/Applications/?api-version=3.0; retrying in 656.765021ms


Solution

  • Thanks to Diego's comment (under my question) I succeeded solving this issue with the following additions.

    What was the problem?

    1. My SF cluster is secured, requiring a client certificate to login -- which was not specified in the Traefik TOML file. (wish the error logged was more informative)
    2. Looking at the Traefik logs, specifically on the SF part (look for the trace starting with Starting provider *servicefabric.Provider :

      "Watch": false,
      "Filename": "",
      "Constraints": null,
      "Trace": false,
      "DebugLogGeneratedTemplate": false,
      "ClusterManagementURL": "https://localhost:19080",
      "APIVersion": "3.0",
      "UseCertificateAuth": false,      <-------- Important
      "ClientCertFilePath": "",         <-------- Important
      "ClientCertKeyFilePath": "",      <-------- Important
      "InsecureSkipVerify": false
      
      • UseCertificateAuth -- indicates whether to use client certificate when Traefik queries the cluster's management endpoint.
      • ClientCertFilePath -- the path of the file containing the public key of the client certificate.
      • ClientCertKeyFilePath -- the path of the file containing the private key of the client certificate.

    (both paths should be relative to the traefik.exe)


    InsecureSkipVerify

    The Traefik's SF config (above) includes a setting called InsecureSkipVerify

    • InsecureSkipVerify -- If set to false, then Traefik will reject the connection to the management endpoint unless the SSL certificate used is signed by a trusted CA.
    • This could be an issue if the certificate is signed for the remote address, while Traefik uses https://localhost as the cluster's endpoint -- as then Traefik would print an error similar to this:

    failed to connect to Service Fabric server Get https://localhost:19080/Applications/?api-version=3.0: x509: certificate is valid for .eastus.cloudapp.azure.com, not localhost

    To overcome this one, you can either

    • Set InsecureSkipVerify = true and redeploy
    • Set the management endpoint to the remote address: clustermanagementurl = "https://<hidden>.eastus.cloudapp.azure.com:19080"

    Thanks again Diego for giving me the hint that lead me to understand and share the above explanation.