Search code examples
kubernetesapache-flink

Flink: pods is forbidden: User "system:serviceaccount:default:default" cannot watch resource "pods" in API group "" in the namespace "default"


I am following the Flink official tutorial to start a session in native Kubernetes.

First I created a clean new cluster.

However, after running

./bin/kubernetes-session.sh -Dkubernetes.cluster-id=my-first-flink-cluster

I got error in the pod my-first-flink-cluster-xxx log that just got created:

2021-08-14 18:33:02,519 WARN  io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Exec Failure: HTTP 403, Status: 403 - pods is forbidden: User "system:serviceaccount:default:default" cannot watch resource "pods" in API group "" in the namespace "default"
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_302]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_302]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_302]
2021-08-14 18:33:02,585 INFO  org.apache.flink.kubernetes.kubeclient.resources.KubernetesPodsWatcher [] - The watcher is closing.
2021-08-14 18:33:02,592 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] - Closing the slot manager.
Exception in thread "OkHttp Dispatcher" java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@b328667 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@31982176[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
    at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
    at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
    at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:632)
    at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.scheduleReconnect(WatchConnectionManager.java:305)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$800(WatchConnectionManager.java:50)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:218)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198)
    at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2021-08-14 18:33:02,624 ERROR org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Fatal error occurred in ResourceManager.
org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Could not start the ResourceManager akka.tcp://[email protected]:6123/user/rpc/resourcemanager_0
    at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:239) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:180) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.actor.Actor.aroundReceive(Actor.scala:517) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.actor.Actor.aroundReceive$(Actor.scala:515) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.12-1.13.1.jar:1.13.1]
Caused by: org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Cannot initialize resource provider.
    at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager.initialize(ActiveResourceManager.java:156) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:251) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:235) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    ... 22 more
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: pods is forbidden: User "system:serviceaccount:default:default" cannot watch resource "pods" in API group "" in the namespace "default"
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:203) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_302]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_302]
    at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_302]
    Suppressed: java.lang.Throwable: waiting here
        at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:144) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:341) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:755) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:739) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:70) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.watchPodsAndDoCallback(Fabric8FlinkKubeClient.java:227) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.kubernetes.KubernetesResourceManagerDriver.watchTaskManagerPods(KubernetesResourceManagerDriver.java:331) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.kubernetes.KubernetesResourceManagerDriver.initializeInternal(KubernetesResourceManagerDriver.java:103) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.resourcemanager.active.AbstractResourceManagerDriver.initialize(AbstractResourceManagerDriver.java:81) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager.initialize(ActiveResourceManager.java:154) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:251) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:235) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:180) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.actor.Actor.aroundReceive(Actor.scala:517) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.actor.Actor.aroundReceive$(Actor.scala:515) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.12-1.13.1.jar:1.13.1]
2021-08-14 18:33:02,773 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error occurred in the cluster entrypoint.
org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Could not start the ResourceManager akka.tcp://[email protected]:6123/user/rpc/resourcemanager_0
    at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:239) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:180) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.actor.Actor.aroundReceive(Actor.scala:517) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.actor.Actor.aroundReceive$(Actor.scala:515) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.12-1.13.1.jar:1.13.1]
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.12-1.13.1.jar:1.13.1]
Caused by: org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Cannot initialize resource provider.
    at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager.initialize(ActiveResourceManager.java:156) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:251) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:235) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    ... 22 more
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: pods is forbidden: User "system:serviceaccount:default:default" cannot watch resource "pods" in API group "" in the namespace "default"
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:203) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_302]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_302]
    at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_302]
    Suppressed: java.lang.Throwable: waiting here
        at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:144) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:341) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:755) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:739) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:70) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.watchPodsAndDoCallback(Fabric8FlinkKubeClient.java:227) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.kubernetes.KubernetesResourceManagerDriver.watchTaskManagerPods(KubernetesResourceManagerDriver.java:331) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.kubernetes.KubernetesResourceManagerDriver.initializeInternal(KubernetesResourceManagerDriver.java:103) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.resourcemanager.active.AbstractResourceManagerDriver.initialize(AbstractResourceManagerDriver.java:81) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager.initialize(ActiveResourceManager.java:154) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:251) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:235) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:180) ~[flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.actor.Actor.aroundReceive(Actor.scala:517) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.actor.Actor.aroundReceive$(Actor.scala:515) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.12-1.13.1.jar:1.13.1]
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.12-1.13.1.jar:1.13.1]
2021-08-14 18:33:02,838 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Shutting KubernetesSessionClusterEntrypoint down with application status UNKNOWN. Diagnostics Cluster entrypoint has been closed externally..
2021-08-14 18:33:02,876 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint   [] - Shutting down rest endpoint.

And this pod keeps restarting.


Solution

  • After being stuck here for a long time, I finally made it. Hope it saves some time for future people.

    In the RBAC section, it mentions

    Every namespace has a default service account. However, the default service account may not have the permission to create or delete pods within the Kubernetes cluster. Users may need to update the permission of the default service account or specify another service account that has the right role bound.

    Here is the way creating another service account:

    kubectl create serviceaccount flink-service-account
    kubectl create clusterrolebinding flink-role-binding-flink --clusterrole=edit --serviceaccount=default:flink-service-account
    

    After creating the service account, you need to pass one more arg kubernetes.jobmanager.service-account for the command to start the session:

    ./bin/kubernetes-session.sh \
        -Dkubernetes.cluster-id=my-first-flink-cluster \
        -Dkubernetes.jobmanager.service-account=flink-service-account
    

    All args can be found at https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/#kubernetes

    Now the session can be successfully started!