Search code examples
javakuberneteshazelcast

Communication in k8s cluster


I have a k8s cluster of 2 hazelcast instances and one client application. Target is to have many clients and at least 2 hazelcast members. I've set up a LoadBalancer type service in k8s to expose hazelcast instances

apiVersion: v1
kind: Service
metadata:
  name: hazelcast-service
  labels:
    app: hazelcast-service
spec:
  type: LoadBalancer
  ports:
  - port: 10236
    targetPort: 5701
  selector:
    app: hazelcast 

And when it comes for client to start with given config:

clientConfig.getNetworkConfig().addAddress("127.0.0.1:10236");

in recognizes a hazelcast members:

May 08, 2018 11:25:21 AM com.hazelcast.core.LifecycleService
INFO: hz.client_0 [dev] [3.9.3] HazelcastClient 3.9.3 (20180216 - 539b124) is STARTING
May 08, 2018 11:25:22 AM com.hazelcast.core.LifecycleService
INFO: hz.client_0 [dev] [3.9.3] HazelcastClient 3.9.3 (20180216 - 539b124) is STARTED
May 08, 2018 11:25:22 AM com.hazelcast.client.connection.ClientConnectionManager
INFO: hz.client_0 [dev] [3.9.3] Trying to connect to [127.0.0.1]:10236 as owner member
May 08, 2018 11:25:22 AM com.hazelcast.client.connection.ClientConnectionManager
INFO: hz.client_0 [dev] [3.9.3] Authenticated with server [10.1.0.151]:5701, server version:3.10 Local address: /127.0.0.1:60102
May 08, 2018 11:25:22 AM com.hazelcast.client.spi.impl.ClientMembershipListener
INFO: hz.client_0 [dev] [3.9.3]

Members [2] {
    Member [10.1.0.148]:5701 - b0e4a52f-0170-47f2-8ff3-74d9b67f45f5
    Member [10.1.0.151]:5701 - 1355caa4-5c2b-4366-bd5b-b504f4f0ae4f
}

May 08, 2018 11:25:22 AM com.hazelcast.client.connection.ClientConnectionManager
INFO: hz.client_0 [dev] [3.9.3] Setting ClientConnection{alive=true, connectionId=1, channel=NioChannel{/127.0.0.1:60102->/127.0.0.1:10236}, remoteEndpoint=[10.1.0.151]:5701, lastReadTime=2018-05-08 11:25:22.420, lastWriteTime=2018-05-08 11:25:22.418, closedTime=never, lastHeartbeatRequested=never, lastHeartbeatReceived=never, connected server version=3.10} as owner with principal ClientPrincipal{uuid='28696aaf-e678-47ee-8c7d-a79ba7a0079a', ownerUuid='1355caa4-5c2b-4366-bd5b-b504f4f0ae4f'}
May 08, 2018 11:25:22 AM com.hazelcast.core.LifecycleService
INFO: hz.client_0 [dev] [3.9.3] HazelcastClient 3.9.3 (20180216 - 539b124) is CLIENT_CONNECTED
May 08, 2018 11:25:22 AM com.hazelcast.internal.diagnostics.Diagnostics
INFO: hz.client_0 [dev] [3.9.3] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.

and when it tries to connect to second instance (10.1.0.151) it also seems to be fine:

May 08, 2018 11:25:29 AM com.hazelcast.core.LifecycleService
INFO: hz.client_1 [dev] [3.9.3] HazelcastClient 3.9.3 (20180216 - 539b124) is STARTING
May 08, 2018 11:25:29 AM com.hazelcast.core.LifecycleService
INFO: hz.client_1 [dev] [3.9.3] HazelcastClient 3.9.3 (20180216 - 539b124) is STARTED
May 08, 2018 11:25:29 AM com.hazelcast.client.connection.ClientConnectionManager
INFO: hz.client_1 [dev] [3.9.3] Trying to connect to [127.0.0.1]:10236 as owner member
May 08, 2018 11:25:29 AM com.hazelcast.client.connection.ClientConnectionManager
INFO: hz.client_1 [dev] [3.9.3] Authenticated with server [10.1.0.148]:5701, server version:3.10 Local address: /127.0.0.1:60113
May 08, 2018 11:25:29 AM com.hazelcast.client.spi.impl.ClientMembershipListener
INFO: hz.client_1 [dev] [3.9.3]

Members [2] {
    Member [10.1.0.148]:5701 - b0e4a52f-0170-47f2-8ff3-74d9b67f45f5
    Member [10.1.0.151]:5701 - 1355caa4-5c2b-4366-bd5b-b504f4f0ae4f
}

May 08, 2018 11:25:29 AM com.hazelcast.client.connection.ClientConnectionManager
INFO: hz.client_1 [dev] [3.9.3] Setting ClientConnection{alive=true, connectionId=1, channel=NioChannel{/127.0.0.1:60113->/127.0.0.1:10236}, remoteEndpoint=[10.1.0.148]:5701, lastReadTime=2018-05-08 11:25:29.455, lastWriteTime=2018-05-08 11:25:29.453, closedTime=never, lastHeartbeatRequested=never, lastHeartbeatReceived=never, connected server version=3.10} as owner with principal ClientPrincipal{uuid='a04aa2ca-626d-4d1a-a366-38c0dbc4781f', ownerUuid='b0e4a52f-0170-47f2-8ff3-74d9b67f45f5'}
May 08, 2018 11:25:29 AM com.hazelcast.core.LifecycleService
INFO: hz.client_1 [dev] [3.9.3] HazelcastClient 3.9.3 (20180216 - 539b124) is CLIENT_CONNECTED
May 08, 2018 11:25:29 AM com.hazelcast.internal.diagnostics.Diagnostics
INFO: hz.client_1 [dev] [3.9.3] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.

but immediately after above message I got another one (seems to be connection problem regarding first member that my client connected to):

Constructor threw exception; nested exception is com.hazelcast.core.OperationTimeoutException: ClientInvocation{clientMessage = ClientMessage{length=72, correlationId=272, operation=Client.createProxy, messageType=5, partitionId=-1, isComplete=true, isRetryable=false, isEvent=false, writeOffset=0}, objectName = hz:impl:mapService, target = address [10.1.0.151]:5701, sendConnection = null} timed out because exception occurred after client invocation timeout 120000 ms. Current time: 2018-05-08 11:27:29.913. Start time: 2018-05-08 11:25:29.458. Total elapsed time: 120455 ms.

Sometimes it cannot even connect to the first member as I got OperationTimeoutException after client says that it connected to 10.1.0.151 member. Funny thing is that sometimes it works all fine:( And when I only have one replica of hazelcast pods it works predictable & fine. Thus, I believe this is because of LoadBalancer service which distributes requests equally among target pods and that there is something wrong with this set-up.

I suppose that client should be able to connect to any node it wants as any node can store requested item in its map but I don't know how to set up such a configuration in k8s.

Question is: how should I configure services in k8s so that client apps can talk to all members? Or this is not the case and it should work in a different way?

Am I missing something?


Solution

  • If your hazelcast client is inside the kubernetes cluster, you dont really need LoadBalancer type. A simple service ClusterIP or headless would suffice. Hazelcast supports kubernetes discovery mode. I suggest try using ClusterIP or none.