Search code examples
kubernetesload-balancingignite

Apache Ignite: 1000s of warnings "Unable to perform handshake within timeout" get added to the log


Recently I've updated Apache Ignite running in my .Net Core 3.1 application from 2.7.5 to 2.8.1 and today I noticed thousands of warnings like this in the log

Jun 03 18:26:54 quote-service-us-deployment-5d874d8546-psbcs org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:57941]
Jun 03 18:26:59 quote-service-uk-deployment-d644cbc86-7xcvw org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:57982]
Jun 03 18:26:59 quote-service-us-deployment-5d874d8546-psbcs org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:57985]
Jun 03 18:27:04 quote-service-uk-deployment-d644cbc86-7xcvw org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:58050]
Jun 03 18:27:04 quote-service-us-deployment-5d874d8546-psbcs org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:58051]
Jun 03 18:27:09 quote-service-uk-deployment-d644cbc86-7xcvw org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:58114]
Jun 03 18:27:09 quote-service-us-deployment-5d874d8546-psbcs org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:58118] 

I don't use ODBC or JDBC directly in my app and the app is running in a Kubernetes cluster in a virtual network. Interestingly, in all cases the IP on the other end of connection (10.250.0.4 in this case) belongs to the kube-proxy pod. I am a bit perplexed by this.

UPD: The same IP address is reported to belong also to the following pods: azure-ip-masq-agent and azure-cni-networkmonitor (I guess those belong to Azure Kubernetes Services that I use to run the K8s cluster)

So it is possible that the network monitor is attempting to reach the ODBC port (just guessing). Is there any opportunity to suppress that warning or disable ODBC connections at all? I don't use ODBC but I'd like to keep the JDBC connections enabled as I occasionally connect to the Ignite instances using DBeaver. Thank you!


Solution

  • If you've defined a service and opened port 10800 then K8 will perform a health check through kube-proxy. This causes Ignite to receive an incomplete handshake on that port log the "unable to perform handshake" message.

    ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:58050]

    Here the client connector listener(ClientListenerNioListener) is saying that it was not able to establish a successful handshake within 10 seconds to remoteAddr=/10.250.0.4:58050

    config client connector: https://apacheignite.readme.io/docs/binary-client-protocol#connectivity
    client connector handshake: https://apacheignite.readme.io/docs/binary-client-protocol#connection-handshake
     
     

    example of service w/port 10800 opened:

    kind: Service
    metadata: 
      # The name must be equal to TcpDiscoveryKubernetesIpFinder.serviceName
      name: ignite
      # The name must be equal to TcpDiscoveryKubernetesIpFinder.namespaceName
      namespace: ignite
    spec:
      type: LoadBalancer
      ports:
        - name: rest
          port: 8080
          targetPort: 8080
        - name: sql
          port: 10800
          targetPort: 10800
    

    You can redefine the service to not open the port or update the service definition to use different ports for the healthcheck: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip

    from the doc:
    service.spec.healthCheckNodePort - specifies the health check node port (numeric port number) for the service. If healthCheckNodePort isn’t specified, the service controller allocates a port from your cluster’s NodePort range. You can configure that range by setting an API server command line option, --service-node-port-range. It will use the user-specified healthCheckNodePort value if specified by the client. It only has an effect when type is set to LoadBalancer and externalTrafficPolicy is set to Local.